================================================================================ LECTURE 001 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 1 - Class Introduction & Logistics, Andrew Ng Source: https://www.youtube.com/watch?v=PySo_6S4ZAg --- Transcript [00:00:06] okay hey everyone morning welcome to CST [00:00:10] okay hey everyone morning welcome to CST 30 deep learning so many of you know [00:00:15] 30 deep learning so many of you know that deep learning these days is the [00:00:18] that deep learning these days is the latest hottest area of computer science [00:00:21] latest hottest area of computer science or AI arguably deep learning is the [00:00:23] or AI arguably deep learning is the latest hottest area of you know all the [00:00:26] latest hottest area of you know all the human activity maybe but this is a cost [00:00:30] human activity maybe but this is a cost CST 30 deep learning where we hope that [00:00:32] CST 30 deep learning where we hope that we can help you understand the state of [00:00:34] we can help you understand the state of the art and become experts at building [00:00:37] the art and become experts at building and applying deep learning systems [00:00:40] and applying deep learning systems unlike many Stanford courses this class [00:00:43] unlike many Stanford courses this class will be more interactive than than [00:00:45] will be more interactive than than others because this class we often in [00:00:48] others because this class we often in the flipped classroom format where we'll [00:00:50] the flipped classroom format where we'll ask you to watch a lot of the videos at [00:00:53] ask you to watch a lot of the videos at home a lot of the deep learning AI [00:00:56] home a lot of the deep learning AI content hosted on Coursera does [00:00:58] content hosted on Coursera does preserving the classroom and discussion [00:01:00] preserving the classroom and discussion section time but much deeper discussions [00:01:03] section time but much deeper discussions so to get started let me let me first [00:01:06] so to get started let me let me first introduce our teaching team so the [00:01:09] introduce our teaching team so the co-instructors Archaean cotton guru who [00:01:11] co-instructors Archaean cotton guru who had actually one of the co-creators of [00:01:14] had actually one of the co-creators of the deep learning specialization the [00:01:16] the deep learning specialization the atlanta AI content that were using in [00:01:18] atlanta AI content that were using in this class and the rest of the teaching [00:01:23] this class and the rest of the teaching team swathi Dube is the cost coordinator [00:01:27] team swathi Dube is the cost coordinator and she has been working with me and [00:01:29] and she has been working with me and others on coordinating I guess CSU 30 [00:01:33] others on coordinating I guess CSU 30 also cs2 29 CSU generate to make all of [00:01:36] also cs2 29 CSU generate to make all of these classes run well and let you have [00:01:38] these classes run well and let you have a relatively smooth you know experience [00:01:42] a relatively smooth you know experience Eunice Mori is the cause adviser and [00:01:45] Eunice Mori is the cause adviser and he'd also worked closely with IANA me in [00:01:48] he'd also worked closely with IANA me in creating blobby online contents that you [00:01:50] creating blobby online contents that you use and Eunice is also head ta 462 98 [00:01:55] use and Eunice is also head ta 462 98 which some of you may also be taking and [00:01:58] which some of you may also be taking and then we have two co-head TAS or RT by [00:02:02] then we have two co-head TAS or RT by guru who's worked on machine learning [00:02:03] guru who's worked on machine learning research for a long time and opposition [00:02:06] research for a long time and opposition oi who is still traveling back I think [00:02:08] oi who is still traveling back I think and also a large team of TAS that I [00:02:11] and also a large team of TAS that I think about half about TAS and CS 230 [00:02:14] think about half about TAS and CS 230 had previously ta [00:02:16] had previously ta schools and their expertise spans [00:02:19] schools and their expertise spans everything from applying machine [00:02:21] everything from applying machine learning problems in the health care or [00:02:23] learning problems in the health care or climate learning or applying deep [00:02:25] climate learning or applying deep learning to problems it in robotics to [00:02:29] learning to problems it in robotics to problems in computational biology to [00:02:31] problems in computational biology to problems in so I hope that as you work [00:02:34] problems in so I hope that as you work on your projects this quarter as policy [00:02:36] on your projects this quarter as policy as 2:30 you'll be able to get a lot of [00:02:39] as 2:30 you'll be able to get a lot of great advice and help and mentorship [00:02:42] great advice and help and mentorship from all of the tiers as well so the [00:02:46] from all of the tiers as well so the plan for today is I was going to spend [00:02:49] plan for today is I was going to spend maybe the little bit of time sharing [00:02:52] maybe the little bit of time sharing with you what's happening in deep [00:02:54] with you what's happening in deep learning why you know why deep learning [00:02:56] learning why you know why deep learning is taking off and how this might affect [00:02:58] is taking off and how this might affect your careers and then in the second half [00:03:01] your careers and then in the second half I have Ken will take over and talk a bit [00:03:04] I have Ken will take over and talk a bit more about the projects you work on in [00:03:07] more about the projects you work on in this class and not just a final term [00:03:08] this class and not just a final term project but you know the little machine [00:03:11] project but you know the little machine translation system you build the face [00:03:12] translation system you build the face recognition system you build their our [00:03:14] recognition system you build their our generation system you build their all of [00:03:15] generation system you build their all of the many pretty cool machine learning [00:03:18] the many pretty cool machine learning deep learning applications I mean you [00:03:19] deep learning applications I mean you get to build throughout the course of [00:03:21] get to build throughout the course of this quarter and also share view the the [00:03:24] this quarter and also share view the the detailed logistics for the plan for the [00:03:27] detailed logistics for the plan for the class ok so I think that let's see all [00:03:33] class ok so I think that let's see all right I'm going to just use the [00:03:35] right I'm going to just use the whiteboard for this part so um [00:03:47] you know deep learning right you know [00:03:51] you know deep learning right you know seems like the media still can't stop [00:03:52] seems like the media still can't stop talking about it [00:03:53] talking about it and it turns out that a lot of the ideas [00:03:58] and it turns out that a lot of the ideas of deep learning happen around for [00:04:01] of deep learning happen around for several decades right the basic ideas of [00:04:03] several decades right the basic ideas of deep learning happen around for decades [00:04:04] deep learning happen around for decades so why is deep learning suddenly taking [00:04:08] so why is deep learning suddenly taking off now that why is it quote coming out [00:04:09] off now that why is it quote coming out of nowhere on whatever whatever people [00:04:11] of nowhere on whatever whatever people say I think that the main reason that [00:04:14] say I think that the main reason that deep learning has been taking off and [00:04:17] deep learning has been taking off and why you know suddenly all of you [00:04:19] why you know suddenly all of you hopefully will be that do really [00:04:20] hopefully will be that do really powerful things with it much more [00:04:23] powerful things with it much more effectively than two or three years ago [00:04:24] effectively than two or three years ago is the following um for a lot of over [00:04:29] is the following um for a lot of over the last couple decades with the [00:04:31] the last couple decades with the digitization of society we've just [00:04:33] digitization of society we've just collected more and more data so for [00:04:36] collected more and more data so for example all of us spend a lot more time [00:04:37] example all of us spend a lot more time on our computers and smart phones now [00:04:40] on our computers and smart phones now and whenever you do things on the phone [00:04:42] and whenever you do things on the phone you know that creates data right and and [00:04:46] you know that creates data right and and and what used to be represented through [00:04:50] and what used to be represented through pieces of paper is now much more likely [00:04:53] pieces of paper is now much more likely a digital record as well so you're if [00:04:55] a digital record as well so you're if you go take an x-ray as at least in the [00:04:57] you go take an x-ray as at least in the United States less than some other kind [00:04:59] United States less than some other kind in developing colonies beliefs in the [00:05:01] in developing colonies beliefs in the United States there's much higher chance [00:05:02] United States there's much higher chance now than your x-ray in the hospital is a [00:05:04] now than your x-ray in the hospital is a digital image rather than a physical [00:05:06] digital image rather than a physical piece of film or if you order a new [00:05:09] piece of film or if you order a new marker right there's a much higher [00:05:11] marker right there's a much higher chance that the fact that you order the [00:05:13] chance that the fact that you order the marker you know off a website it's now [00:05:15] marker you know off a website it's now represented as a digital record compared [00:05:18] represented as a digital record compared to ten years ago when the state of the [00:05:21] to ten years ago when the state of the global supply chain actually if you [00:05:22] global supply chain actually if you order if you order ten thousand markers [00:05:25] order if you order ten thousand markers there's a much higher chance you know [00:05:27] there's a much higher chance you know ten years ago that the fact that you [00:05:28] ten years ago that the fact that you place that order was stored on a piece [00:05:31] place that order was stored on a piece of paper that someone scribbled saying a [00:05:33] of paper that someone scribbled saying a ship ten thousand markers to Stanford [00:05:35] ship ten thousand markers to Stanford but now that's much more likely to be a [00:05:37] but now that's much more likely to be a digital record and so the fact that so [00:05:40] digital record and so the fact that so many pieces of paper and our digital has [00:05:42] many pieces of paper and our digital has created data and for a lot of [00:05:45] created data and for a lot of application areas the amount of data has [00:05:52] application areas the amount of data has sort of you know exploded over the loss [00:05:56] sort of you know exploded over the loss twenty years but what we found was that [00:05:59] twenty years but what we found was that if you look at more traditional learning [00:06:04] if you look at more traditional learning algorithms traditional machine learning [00:06:10] algorithms traditional machine learning algorithms the performance and most of [00:06:12] algorithms the performance and most of them would Plateau [00:06:13] them would Plateau even as you feed it more and more days [00:06:16] even as you feed it more and more days so by traditional learning algorithms I [00:06:18] so by traditional learning algorithms I mean logistic regression support vector [00:06:20] mean logistic regression support vector machines you know maybe decision trees [00:06:22] machines you know maybe decision trees develop initial details and it was as if [00:06:25] develop initial details and it was as if out all the learning algorithms didn't [00:06:27] out all the learning algorithms didn't know what to do of all the data you can [00:06:28] know what to do of all the data you can now feed it but what we start to define [00:06:31] now feed it but what we start to define several years ago was they between a [00:06:33] several years ago was they between a small neural network right it's [00:06:38] small neural network right it's performance may look like that if we [00:06:40] performance may look like that if we train a medium neural net the ones may [00:06:43] train a medium neural net the ones may look like that and if you train a very [00:06:46] look like that and if you train a very large neural net you know the [00:06:48] large neural net you know the performance kind of keeps on getting [00:06:49] performance kind of keeps on getting better and better up to some usually up [00:06:52] better and better up to some usually up to some theoretical limit called Bayes [00:06:53] to some theoretical limit called Bayes error rate which is learned about later [00:06:55] error rate which is learned about later this coarser a bit but performance can [00:06:57] this coarser a bit but performance can never exceed 100% but sometimes [00:06:59] never exceed 100% but sometimes sometimes there's some seething in the [00:07:01] sometimes there's some seething in the performance but also we've been able to [00:07:03] performance but also we've been able to measure on many many problems with not [00:07:06] measure on many many problems with not yet I think that across machine learning [00:07:08] yet I think that across machine learning and deep learning broadly I think we've [00:07:10] and deep learning broadly I think we've not yet hit the limits of scale and by [00:07:13] not yet hit the limits of scale and by scale I mean the amount of data you can [00:07:15] scale I mean the amount of data you can throw the problem that's still useful [00:07:17] throw the problem that's still useful for the problem as well as the size of [00:07:20] for the problem as well as the size of the neural networks and I think you know [00:07:23] the neural networks and I think you know GPU computing was a large part of how we [00:07:28] GPU computing was a large part of how we were able to go from training small two [00:07:30] were able to go from training small two mediums and now training very large [00:07:32] mediums and now training very large neural networks and once upon a time I [00:07:35] neural networks and once upon a time I think you know the first actually I [00:07:37] think you know the first actually I think a lot of the early work on [00:07:38] think a lot of the early work on training neural networks on GPUs done [00:07:41] training neural networks on GPUs done here at Stanford right who they're using [00:07:45] here at Stanford right who they're using crew that the training neural networks [00:07:46] crew that the training neural networks but what used to be you know one thing [00:07:49] but what used to be you know one thing one lessons we learn over and over in [00:07:51] one lessons we learn over and over in computing is that what yesterday's [00:07:53] computing is that what yesterday's supercomputer is today's you know [00:07:56] supercomputer is today's you know processor on your on your SmartWatch [00:07:58] processor on your on your SmartWatch right and so what used to be an amount [00:08:00] right and so what used to be an amount of computation that was accessible only [00:08:02] of computation that was accessible only to you know large research labs in [00:08:04] to you know large research labs in Stanford they could spend a hundred [00:08:06] Stanford they could spend a hundred thousand dollars on GPUs today you could [00:08:08] thousand dollars on GPUs today you could that honor on a cloud relatively [00:08:10] that honor on a cloud relatively inexpensively and so the availability of [00:08:13] inexpensively and so the availability of relatively large neural network training [00:08:15] relatively large neural network training capabilities as allow really students [00:08:18] capabilities as allow really students really know almost everyone many people [00:08:20] really know almost everyone many people not not many many people to have enough [00:08:23] not not many many people to have enough access computational power to train what [00:08:27] access computational power to train what are large enough nearing that where else [00:08:29] are large enough nearing that where else to drive very high levels of accuracy [00:08:31] to drive very high levels of accuracy for a lot of applications right and it [00:08:35] for a lot of applications right and it turns out that if you look broadly [00:08:39] turns out that if you look broadly across AI you know I think the mass [00:08:42] across AI you know I think the mass media right newspapers reporters use the [00:08:45] media right newspapers reporters use the term AI I think within within academia [00:08:49] term AI I think within within academia or within the industry you tend to say [00:08:50] or within the industry you tend to say machine learning and deep learning but [00:08:53] machine learning and deep learning but if you look broadly across AI it turns [00:08:56] if you look broadly across AI it turns out that AI has many many tools that's [00:08:59] out that AI has many many tools that's beyond machine learning does even beyond [00:09:01] beyond machine learning does even beyond deep learning and if any of you take you [00:09:03] deep learning and if any of you take you know CS 221 write stamp is a high class [00:09:05] know CS 221 write stamp is a high class great class you learn about a lot of [00:09:08] great class you learn about a lot of these other tools of AI but the reason [00:09:11] these other tools of AI but the reason that deep learning is so valuable today [00:09:14] that deep learning is so valuable today is that if you look across many of the [00:09:16] is that if you look across many of the tools of AI and that's saying you know [00:09:19] tools of AI and that's saying you know there's a deep learning slash machine [00:09:23] there's a deep learning slash machine learning oh and and and and again some [00:09:26] learning oh and and and and again some you know new networks and deep learning [00:09:28] you know new networks and deep learning mean almost exactly the same thing right [00:09:30] mean almost exactly the same thing right it's just that as you know as we start [00:09:33] it's just that as you know as we start to see deep learning rise of the last [00:09:35] to see deep learning rise of the last several years we found that deep [00:09:37] several years we found that deep learning was just a much more attractive [00:09:40] learning was just a much more attractive brand and so you know and so so that's [00:09:44] brand and so you know and so so that's the brand that took off but if you look [00:09:48] the brand that took off but if you look at if you even take an AI class it was a [00:09:50] at if you even take an AI class it was a broadly across the portfolio of tools [00:09:52] broadly across the portfolio of tools you have an AI I think that you know [00:09:55] you have an AI I think that you know I'll often use deep learning machine [00:09:57] I'll often use deep learning machine learning how sometimes also use a [00:09:59] learning how sometimes also use a probabilistic graphical model right we [00:10:02] probabilistic graphical model right we should learn the button cs2 Tony also [00:10:04] should learn the button cs2 Tony also great cause sometimes I use the planning [00:10:07] great cause sometimes I use the planning algorithm you know when I'm working on a [00:10:08] algorithm you know when I'm working on a self-driving car right you need a motion [00:10:10] self-driving car right you need a motion planning algorithm you need various [00:10:11] planning algorithm you need various planning our rhythms sometimes I use a [00:10:14] planning our rhythms sometimes I use a search algorithm sometimes I use [00:10:16] search algorithm sometimes I use knowledge representation it's very sick [00:10:18] knowledge representation it's very sick this is one of the technologies [00:10:19] this is one of the technologies especially knowledge drafts [00:10:21] especially knowledge drafts is one of the technologies that is [00:10:23] is one of the technologies that is widely used in industry but I think [00:10:25] widely used in industry but I think often underappreciated in academia if [00:10:29] often underappreciated in academia if you do a web search and a web search [00:10:31] you do a web search and a web search engine pulls up a hotel and the list of [00:10:32] engine pulls up a hotel and the list of room prices and what this Wi-Fi was a [00:10:34] room prices and what this Wi-Fi was a swimming pool that's actually a [00:10:36] swimming pool that's actually a knowledge graph or knowledge [00:10:37] knowledge graph or knowledge representation knowledge graph but so [00:10:39] representation knowledge graph but so it's actually used by many companies [00:10:40] it's actually used by many companies this large databases but this is that he [00:10:43] this large databases but this is that he may be under appreciated in academia or [00:10:46] may be under appreciated in academia or sometimes even game theory so if you [00:10:50] sometimes even game theory so if you learn about AI there is a very large [00:10:51] learn about AI there is a very large portfolio of many different tools you [00:10:53] portfolio of many different tools you will see but what has happened over the [00:10:56] will see but what has happened over the last several years is if you go to a [00:11:01] last several years is if you go to a conference on probabilistic graphical [00:11:03] conference on probabilistic graphical models right if this is time and this is [00:11:05] models right if this is time and this is a performance e you see that you know [00:11:11] a performance e you see that you know every year probably see graphical models [00:11:15] every year probably see graphical models work a little bit better than the year [00:11:16] work a little bit better than the year before [00:11:16] before if you go to the u AI conference [00:11:19] if you go to the u AI conference uncertainty an AI conference maybe the [00:11:21] uncertainty an AI conference maybe the one of the leading conferences maybe the [00:11:23] one of the leading conferences maybe the leading one not shown on P GM's you see [00:11:25] leading one not shown on P GM's you see there every year you know researchers [00:11:27] there every year you know researchers published papers that better than the [00:11:28] published papers that better than the year before in the state it's the the [00:11:31] year before in the state it's the the field is steadily marching always I'm [00:11:33] field is steadily marching always I'm saying for planning if you go to Tripoli [00:11:35] saying for planning if you go to Tripoli on something you see you know a feud is [00:11:36] on something you see you know a feud is advancing social rooms are getting [00:11:38] advancing social rooms are getting better [00:11:39] better another obsession Alvarez game better [00:11:41] another obsession Alvarez game better getting Theory albums again better and [00:11:43] getting Theory albums again better and so the the field of AI marches forward [00:11:45] so the the field of AI marches forward across all of these different [00:11:47] across all of these different disciplines but the one that has taken [00:11:50] disciplines but the one that has taken off you know incredibly quickly is deep [00:11:54] off you know incredibly quickly is deep learning machine learning and I think a [00:11:57] learning machine learning and I think a lot of this progress was initially [00:12:00] lot of this progress was initially driven by scale scale of data and scale [00:12:04] driven by scale scale of data and scale of computation and the fact that we can [00:12:05] of computation and the fact that we can now get tons of data during the surge on [00:12:08] now get tons of data during the surge on your network and get good performance [00:12:09] your network and get good performance but more recently has been also driven [00:12:12] but more recently has been also driven by the positive feedback loop of seeing [00:12:17] by the positive feedback loop of seeing early traction and deep learning thus [00:12:19] early traction and deep learning thus causing a lot more people to do research [00:12:21] causing a lot more people to do research and deep learning algorithms and so [00:12:23] and deep learning algorithms and so there's been tons of algorithmic [00:12:24] there's been tons of algorithmic innovation in deep learning in the last [00:12:27] innovation in deep learning in the last several years and you hear a lot about [00:12:29] several years and you hear a lot about algorithms that were you know relatively [00:12:31] algorithms that were you know relatively recently invented [00:12:32] recently invented to sauce as well right and so really I [00:12:36] to sauce as well right and so really I think that initially the twin forces of [00:12:38] think that initially the twin forces of a scale of data scale computation but [00:12:40] a scale of data scale computation but now the triple forces have also a lot of [00:12:42] now the triple forces have also a lot of algorithmic innovation and massive [00:12:43] algorithmic innovation and massive investment is continuing to make deep [00:12:46] investment is continuing to make deep learning make tremendous progress and so [00:12:50] learning make tremendous progress and so in CST 30 we kind of have you know I I [00:12:57] in CST 30 we kind of have you know I I think two main goals the first is to [00:13:00] think two main goals the first is to have you become expert in the deep [00:13:04] have you become expert in the deep learning algorithms have you have you [00:13:06] learning algorithms have you have you learned the city arts have you have you [00:13:08] learned the city arts have you have you have you have deep technical knowledge [00:13:10] have you have deep technical knowledge on the save outs and deep learning and [00:13:13] on the save outs and deep learning and second is to give you the know how to [00:13:16] second is to give you the know how to apply these algorithms to whatever [00:13:18] apply these algorithms to whatever problems you want to work on so one of [00:13:22] problems you want to work on so one of the things I've learned so I think you [00:13:24] the things I've learned so I think you know actually some some of you guys know [00:13:25] know actually some some of you guys know my history right so you know birthed at [00:13:27] my history right so you know birthed at Stanford for a long time then um started [00:13:30] Stanford for a long time then um started as leading the Google brain team which [00:13:32] as leading the Google brain team which did law projects at Google and I think [00:13:34] did law projects at Google and I think the Google brain teams you know built [00:13:35] the Google brain teams you know built from scratch was arguably the leading [00:13:38] from scratch was arguably the leading force for helping Google go from what [00:13:41] force for helping Google go from what was already a great internet company [00:13:43] was already a great internet company into today a great AI company and then [00:13:46] into today a great AI company and then there's something some that I do in [00:13:48] there's something some that I do in China [00:13:49] China oh it's Chinese hey cause in China which [00:13:51] oh it's Chinese hey cause in China which kind of helped I do go from also what [00:13:55] kind of helped I do go from also what was already a great company into today [00:13:57] was already a great company into today you know many people say China's [00:13:59] you know many people say China's greatest AI company and I think through [00:14:02] greatest AI company and I think through work on many projects at Google many [00:14:03] work on many projects at Google many friends if I do and now leading landing [00:14:05] friends if I do and now leading landing AI are helping many companies on many [00:14:07] AI are helping many companies on many projects and running around to different [00:14:09] projects and running around to different companies and see many different machine [00:14:11] companies and see many different machine learning projects they have I think I've [00:14:12] learning projects they have I think I've been fortunate to learn a lot of lessons [00:14:15] been fortunate to learn a lot of lessons not just about the technical aspects of [00:14:18] not just about the technical aspects of machine learning but about the practical [00:14:20] machine learning but about the practical know-how has fangs the Machine your name [00:14:22] know-how has fangs the Machine your name and if you and and I think that what you [00:14:28] and if you and and I think that what you can learn from you know the internet or [00:14:32] can learn from you know the internet or from purely academic sources or from [00:14:34] from purely academic sources or from reading research papers is a lot of the [00:14:36] reading research papers is a lot of the technical aspects of machine learning [00:14:38] technical aspects of machine learning and deep learning [00:14:40] and deep learning but there are a lot of other practical [00:14:42] but there are a lot of other practical aspects of how to get these algorithms [00:14:43] aspects of how to get these algorithms to work that I actually do not know of [00:14:46] to work that I actually do not know of any other academic course that that kind [00:14:49] any other academic course that that kind of goes into great deaf teaching rates [00:14:51] of goes into great deaf teaching rates there might be one but I'm I'm not sure [00:14:53] there might be one but I'm I'm not sure but one of the things that we hope to do [00:14:58] but one of the things that we hope to do in this class is to not just give you [00:15:00] in this class is to not just give you the tools but also giving know-how on [00:15:02] the tools but also giving know-how on how to make it work right and I think [00:15:04] how to make it work right and I think you know I should spend a lot of time [00:15:05] you know I should spend a lot of time thinking about [00:15:06] thinking about so actually late last night I actually [00:15:09] so actually late last night I actually stayed that very late last night meeting [00:15:10] stayed that very late last night meeting this new book by um Jon osterhaus on a [00:15:13] this new book by um Jon osterhaus on a software architecture right and I think [00:15:16] software architecture right and I think that there's a huge difference between [00:15:18] that there's a huge difference between you know a junior software engineer and [00:15:20] you know a junior software engineer and a senior software engineer maybe [00:15:22] a senior software engineer maybe everyone understands the c-plus paused [00:15:24] everyone understands the c-plus paused in the Python in the Java syntax yeah [00:15:26] in the Python in the Java syntax yeah you can get that from promote from you [00:15:28] you can get that from promote from you just figure out hey this is how c-plus [00:15:30] just figure out hey this is how c-plus this works inside job where else is how [00:15:32] this works inside job where else is how Python numpy works but it's often the [00:15:35] Python numpy works but it's often the high level judgment decisions of how the [00:15:38] high level judgment decisions of how the architecture system what abstractions do [00:15:41] architecture system what abstractions do you use how do you define interfaces [00:15:43] you use how do you define interfaces that defines the difference between a [00:15:45] that defines the difference between a really good software engineer versus you [00:15:47] really good software engineer versus you know a less experienced software [00:15:48] know a less experienced software engineer it's not understanding c-plus [00:15:50] engineer it's not understanding c-plus or syntax and I think in the same way [00:15:53] or syntax and I think in the same way today there are lots of ways for you to [00:15:56] today there are lots of ways for you to learn the technical tools of machine [00:15:59] learn the technical tools of machine learning and deep learning and you will [00:16:00] learning and deep learning and you will learn that in this class you know you [00:16:02] learn that in this class you know you learn how to train a neural network you [00:16:04] learn how to train a neural network you learn the latest optimization algorithms [00:16:06] learn the latest optimization algorithms you understand deeply what the content [00:16:08] you understand deeply what the content is whether recurrent neural network [00:16:10] is whether recurrent neural network whereas when lsdm is you you understand [00:16:12] whereas when lsdm is you you understand what intention mod allows you you learn [00:16:14] what intention mod allows you you learn all of these things in great detail your [00:16:16] all of these things in great detail your work impression could be vision [00:16:17] work impression could be vision nationally entrusting speech and so on [00:16:19] nationally entrusting speech and so on but I think one other thing that is [00:16:21] but I think one other thing that is relatively unique to this class and to [00:16:26] relatively unique to this class and to that I guess the the things you see on [00:16:29] that I guess the the things you see on the defender AI course are websites as [00:16:31] the defender AI course are websites as what's the things with doing cause is [00:16:33] what's the things with doing cause is trying to give you the practical [00:16:35] trying to give you the practical know-how so that when you're building a [00:16:37] know-how so that when you're building a machine learning system you can be very [00:16:38] machine learning system you can be very efficient in deciding things like should [00:16:42] efficient in deciding things like should you collect more data or not right and [00:16:44] you collect more data or not right and the answer is not always yes I think I [00:16:46] the answer is not always yes I think I think um with [00:16:48] think um with I think that many of us try to convey [00:16:51] I think that many of us try to convey the message that having more data is [00:16:54] the message that having more data is good right and that's actually more data [00:16:56] good right and that's actually more data pretty much never hurts but I think the [00:16:58] pretty much never hurts but I think the message of big data has also been [00:17:00] message of big data has also been overhyped and sometimes it's actually [00:17:02] overhyped and sometimes it's actually not worth your while to couldn't collect [00:17:03] not worth your while to couldn't collect more data right but so when you're [00:17:06] more data right but so when you're working a machine learning project and [00:17:08] working a machine learning project and if you are either doing it by yourself [00:17:10] if you are either doing it by yourself or leading a team your ability to make a [00:17:12] or leading a team your ability to make a good judgment decision about should just [00:17:14] good judgment decision about should just spend another week collecting more data [00:17:16] spend another week collecting more data or should you spend another week [00:17:18] or should you spend another week searching for Hyper parameters or tuning [00:17:20] searching for Hyper parameters or tuning parameters your own network that's the [00:17:22] parameters your own network that's the type of decision that if you make it [00:17:24] type of decision that if you make it correctly can easily make your team 2x [00:17:27] correctly can easily make your team 2x or 3x or maybe 10x more efficient and so [00:17:30] or 3x or maybe 10x more efficient and so one thing we hope to do in this class is [00:17:32] one thing we hope to do in this class is more systematically imparts to you this [00:17:35] more systematically imparts to you this this type of knowledge right and so I [00:17:39] this type of knowledge right and so I think even today I you know actually I [00:17:44] think even today I you know actually I actually visited lots of machine [00:17:45] actually visited lots of machine learning teams around Silicon Valley [00:17:47] learning teams around Silicon Valley around we're on the cusp you what [00:17:48] around we're on the cusp you what they're doing and you know recently I [00:17:51] they're doing and you know recently I visited a company that had a team of 30 [00:17:55] visited a company that had a team of 30 people trying to build a learning [00:17:57] people trying to build a learning algorithm and the team about 30 people [00:17:59] algorithm and the team about 30 people was working on learning out run for [00:18:01] was working on learning out run for about three months right and and they [00:18:03] about three months right and and they had not yet managed to get it to work so [00:18:05] had not yet managed to get it to work so they're basically you know like you know [00:18:06] they're basically you know like you know not succeeded after three months one of [00:18:10] not succeeded after three months one of my colleagues took the data set oh yeah [00:18:14] my colleagues took the data set oh yeah can your broadcasting don't say anything [00:18:17] can your broadcasting don't say anything bad alright so one of my colleagues took [00:18:28] bad alright so one of my colleagues took the data set home and spend one weekend [00:18:30] the data set home and spend one weekend working on what's he doing now and and [00:18:40] working on what's he doing now and and and one of my colleagues working on this [00:18:43] and one of my colleagues working on this problem in one long weekend here at Sun [00:18:45] problem in one long weekend here at Sun over long weekend for three days was [00:18:47] over long weekend for three days was able to build a machine learning system [00:18:48] able to build a machine learning system that outperform what this group of 30 [00:18:50] that outperform what this group of 30 people have been able to do after about [00:18:52] people have been able to do after about three months so was that desica oh no [00:18:55] three months so was that desica oh no that's more than a tengas difference and [00:18:58] that's more than a tengas difference and right and and a lot of the differences [00:19:00] right and and a lot of the differences between the great machine learning teams [00:19:01] between the great machine learning teams versus less experienced ones is actually [00:19:03] versus less experienced ones is actually not just do you know how to you know [00:19:06] not just do you know how to you know implement it's not just you know how to [00:19:10] implement it's not just you know how to implement and LST em right in intensive [00:19:13] implement and LST em right in intensive though or carrots or whatever you have [00:19:15] though or carrots or whatever you have to know that but there's actually other [00:19:17] to know that but there's actually other things as well and I think Ken and I and [00:19:21] things as well and I think Ken and I and the teaching team are looking forward [00:19:22] the teaching team are looking forward trying to systematically impart to you a [00:19:25] trying to systematically impart to you a lot of this know-how so that when [00:19:27] lot of this know-how so that when hopefully someday what you're leading a [00:19:29] hopefully someday what you're leading a team of machine learning engineers or [00:19:31] team of machine learning engineers or deep learning engineers that you could [00:19:32] deep learning engineers that you could help direct the team's efforts more [00:19:34] help direct the team's efforts more efficiently and oh actually if any we're [00:19:38] efficiently and oh actually if any we're interested one of the things happen [00:19:43] interested one of the things happen actually how many of you have heard of [00:19:45] actually how many of you have heard of machine learning yearning machine [00:19:48] machine learning yearning machine learning yearning Wow almost none of you [00:19:49] learning yearning Wow almost none of you okay interesting um so this is a if this [00:19:52] okay interesting um so this is a if this is your first machine learning class [00:19:54] is your first machine learning class this may be too advanced for you but if [00:19:57] this may be too advanced for you but if you've had a little bit of other machine [00:19:58] you've had a little bit of other machine learning background machine learning [00:20:00] learning background machine learning yearning is a booklet being right did [00:20:02] yearning is a booklet being right did I've been I've been working on its [00:20:04] I've been I've been working on its slowing draw form but if any of you want [00:20:07] slowing draw form but if any of you want to but machine learning yearning is my [00:20:10] to but machine learning yearning is my attempt to try to turn gather best [00:20:14] attempt to try to turn gather best principles the turning machine learning [00:20:16] principles the turning machine learning from a black art into systematic [00:20:17] from a black art into systematic engineer discipline and so if you go to [00:20:20] engineer discipline and so if you go to this website you know this website will [00:20:23] this website you know this website will send you actually I just finished the [00:20:25] send you actually I just finished the last just finished the whole draft last [00:20:28] last just finished the whole draft last weekend and so email allowing students [00:20:31] weekend and so email allowing students if you want a copy go to the website and [00:20:33] if you want a copy go to the website and enter your email address now make sure [00:20:35] enter your email address now make sure that you know when we send out the book [00:20:37] that you know when we send out the book actually might be later today not sure [00:20:39] actually might be later today not sure that well then you get a copy of the [00:20:41] that well then you get a copy of the book drop this wall I tend to write [00:20:43] book drop this wall I tend to write books and then just post them on the [00:20:44] books and then just post them on the internet for free so you could but this [00:20:46] internet for free so you could but this here was just email them them out to [00:20:48] here was just email them them out to people so you can you can you can get it [00:20:50] people so you can you can you can get it if you go to the web site and I think [00:20:54] if you go to the web site and I think this will and I think this calls to [00:20:56] this will and I think this calls to talking all about lot of principles of [00:20:57] talking all about lot of principles of machine learning urine II but give you [00:20:59] machine learning urine II but give you much more practice as well then they're [00:21:01] much more practice as well then they're just reading a book might um so [00:21:08] let's see okay so um Jen will give a [00:21:13] let's see okay so um Jen will give a greater overview of what we'll cover in [00:21:16] greater overview of what we'll cover in this class but one of the principles [00:21:19] this class but one of the principles have learned as well is that you know it [00:21:21] have learned as well is that you know it so I think I'm actually some of you know [00:21:25] so I think I'm actually some of you know my background right is a co-founder [00:21:27] my background right is a co-founder Coursera was initially for a long time [00:21:28] Coursera was initially for a long time so spent a long time really thinking a [00:21:30] so spent a long time really thinking a lot about education and I think cs2 30 [00:21:34] lot about education and I think cs2 30 represents you know Keon and mine are [00:21:36] represents you know Keon and mine are teaching teens really best attempt to [00:21:39] teaching teens really best attempt to deliver a great on-campus deep learning [00:21:42] deliver a great on-campus deep learning course and so and so the format of this [00:21:52] course and so and so the format of this class is what's called a flipped [00:21:55] class is what's called a flipped classroom class and what that means is [00:21:57] classroom class and what that means is that so you know and I think I've taught [00:22:00] that so you know and I think I've taught on SCPD for a long time right for many [00:22:03] on SCPD for a long time right for many many years I guess and I found it even [00:22:05] many years I guess and I found it even for classes like CST to nine or other [00:22:08] for classes like CST to nine or other Stanford courses often students end up [00:22:11] Stanford courses often students end up you know watching videos at home and and [00:22:14] you know watching videos at home and and I think with the flipped classroom what [00:22:16] I think with the flipped classroom what we realized was if many students are [00:22:18] we realized was if many students are watching videos of these lectures at [00:22:20] watching videos of these lectures at home anyway why don't we spend a lot of [00:22:24] home anyway why don't we spend a lot of effort to produce higher quality videos [00:22:26] effort to produce higher quality videos that you can watch then a more time [00:22:29] that you can watch then a more time efficient for you to watch at home and [00:22:31] efficient for you to watch at home and so our team created videos DVR I created [00:22:36] so our team created videos DVR I created you know kind of the best videos we knew [00:22:38] you know kind of the best videos we knew how to create on deep learning there are [00:22:40] how to create on deep learning there are now hosted on Coursera and so with I [00:22:43] now hosted on Coursera and so with I actually think that will be quite time [00:22:46] actually think that will be quite time efficient for you to watch those videos [00:22:49] efficient for you to watch those videos do the online program exercises do the [00:22:52] do the online program exercises do the online quizzes and what that does is it [00:22:55] online quizzes and what that does is it preserves the class time both the weekly [00:22:58] preserves the class time both the weekly sessions that we meet right here on [00:22:59] sessions that we meet right here on Wednesdays as well as the TA discussion [00:23:01] Wednesdays as well as the TA discussion sections on Fridays for much deeper [00:23:04] sections on Fridays for much deeper interactions and for much deeper [00:23:06] interactions and for much deeper discussions and so the format of the [00:23:09] discussions and so the format of the class is that we ask you to you know do [00:23:13] class is that we ask you to you know do the online content created by the Avaya [00:23:15] the online content created by the Avaya host on Coursera [00:23:17] host on Coursera then in class both the meetings with [00:23:20] then in class both the meetings with enemy Anakin and I will split these [00:23:22] enemy Anakin and I will split these sessions roughly 50/50 as was for the [00:23:25] sessions roughly 50/50 as was for the deeper small group discussion sections [00:23:26] deeper small group discussion sections you have for the TAS that lets you spend [00:23:29] you have for the TAS that lets you spend much more time interacting with the TAS [00:23:31] much more time interacting with the TAS interacting with knme and going deeper [00:23:34] interacting with knme and going deeper into the material then just then the [00:23:37] into the material then just then the then the then the then the online [00:23:39] then the then the then the online content by yourself and that will also [00:23:43] content by yourself and that will also give us more opportunities to give you [00:23:45] give us more opportunities to give you advanced material that goes beyond was [00:23:48] advanced material that goes beyond was hosted online as well as give you [00:23:52] hosted online as well as give you additional practice with these concepts [00:23:54] additional practice with these concepts right and so let's see yeah and so um I [00:24:03] right and so let's see yeah and so um I was a finish up with two more files and [00:24:06] was a finish up with two more files and now I hand it over to Karen I think you [00:24:10] now I hand it over to Karen I think you know machine learning deep learning AI [00:24:13] know machine learning deep learning AI whatever is changing a lot of industries [00:24:15] whatever is changing a lot of industries right IIIi think you know I think AI is [00:24:17] right IIIi think you know I think AI is the new electricity much as the rise of [00:24:21] the new electricity much as the rise of electricity about 100 years ago starting [00:24:24] electricity about 100 years ago starting in the United States transform every [00:24:25] in the United States transform every industry really you know the rise of HST [00:24:29] industry really you know the rise of HST transform agriculture because finally we [00:24:31] transform agriculture because finally we have refrigeration right that transform [00:24:33] have refrigeration right that transform agriculture a transform healthcare [00:24:35] agriculture a transform healthcare imagine going to a hospital today [00:24:37] imagine going to a hospital today there's no electricity or how do you how [00:24:38] there's no electricity or how do you how do you even do that right [00:24:40] do you even do that right computers medical devices have even run [00:24:42] computers medical devices have even run a healthcare system with transform [00:24:44] a healthcare system with transform communications through telecom through [00:24:46] communications through telecom through the Telegraph initiative and now so much [00:24:48] the Telegraph initiative and now so much the communications really needs [00:24:49] the communications really needs electricity but electricity transform [00:24:51] electricity but electricity transform every major industry and I think machine [00:24:54] every major industry and I think machine learning and deep learning has reached a [00:24:56] learning and deep learning has reached a level of maturity where we see a [00:24:57] level of maturity where we see a surprisingly clear path for it to also [00:25:00] surprisingly clear path for it to also transform pretty much every industry and [00:25:03] transform pretty much every industry and I hope that through this class after [00:25:07] I hope that through this class after these next ten weeks that all of you [00:25:09] these next ten weeks that all of you will be well qualified to go into these [00:25:12] will be well qualified to go into these different industries and help transform [00:25:15] different industries and help transform them as well and I think you know after [00:25:18] them as well and I think you know after this class I hope that you'll be well [00:25:20] this class I hope that you'll be well qualified to like get a job and some of [00:25:22] qualified to like get a job and some of the big shiny [00:25:23] the big shiny tech companies that have large AIT [00:25:27] tech companies that have large AIT I think a lot of the most exciting work [00:25:29] I think a lot of the most exciting work to be done today still is to go into the [00:25:31] to be done today still is to go into the less shiny industries that do not yet [00:25:34] less shiny industries that do not yet have AI machine learning yet and to take [00:25:37] have AI machine learning yet and to take it to those areas actually underway in [00:25:39] it to those areas actually underway in us chatting with a student that works in [00:25:42] us chatting with a student that works in cosmology who was commenting was that [00:25:44] cosmology who was commenting was that you know who was it so oh at the back I [00:25:46] you know who was it so oh at the back I was commenting the cosmology needs more [00:25:48] was commenting the cosmology needs more machine learning right and and then [00:25:50] machine learning right and and then maybe he will be the one to take a lot [00:25:52] maybe he will be the one to take a lot of the ideas or deep learning into [00:25:53] of the ideas or deep learning into cosmology because I think even outside [00:25:55] cosmology because I think even outside the shiny tech areas like and then maybe [00:25:58] the shiny tech areas like and then maybe since I helped play around there AI [00:26:00] since I helped play around there AI transmission of two large research [00:26:02] transmission of two large research companies I'm like done transforming [00:26:04] companies I'm like done transforming internet search companies and I think [00:26:06] internet search companies and I think that but I think and I think it's great [00:26:08] that but I think and I think it's great that we have those great AI teams like [00:26:09] that we have those great AI teams like Google grain I do AI grew other large [00:26:12] Google grain I do AI grew other large tech companies are great a I teams I [00:26:14] tech companies are great a I teams I think that's wonderful I think a lot of [00:26:15] think that's wonderful I think a lot of the important work to be done now how [00:26:17] the important work to be done now how many of you will do is to take AI to [00:26:18] many of you will do is to take AI to health care taking out the competition [00:26:19] health care taking out the competition ology taking out to civil energy intake [00:26:21] ology taking out to civil energy intake Anatomy country I think all of this is [00:26:23] Anatomy country I think all of this is worth doing just like electricity didn't [00:26:26] worth doing just like electricity didn't have one color app it's useful for a lot [00:26:28] have one color app it's useful for a lot of things and I think many of you will [00:26:31] of things and I think many of you will go out after this cause and execute many [00:26:33] go out after this cause and execute many exciting projects both in tech companies [00:26:35] exciting projects both in tech companies and in you know other areas that that [00:26:38] and in you know other areas that that like cosmology right or other areas they [00:26:41] like cosmology right or other areas they were not traditionally considered CAS [00:26:45] were not traditionally considered CAS areas um so just wrap up with a to lost [00:26:53] areas um so just wrap up with a to lost thoughts I think that one of the things [00:26:59] thoughts I think that one of the things that excites me these days is on hoping [00:27:03] that excites me these days is on hoping you know I always share view one of the [00:27:05] you know I always share view one of the lessons I learned right watching the [00:27:08] lessons I learned right watching the rise of AI in multiple companies and [00:27:10] rise of AI in multiple companies and smell a long time thinking about you [00:27:12] smell a long time thinking about you know what is it that makes a great AI [00:27:15] know what is it that makes a great AI company and one of the lessons I learned [00:27:17] company and one of the lessons I learned was really a hearing Jeff Bezos speak [00:27:20] was really a hearing Jeff Bezos speak about what is it that makes for an [00:27:22] about what is it that makes for an Internet company right and I think a lot [00:27:24] Internet company right and I think a lot of lessons that we learn with the rise [00:27:27] of lessons that we learn with the rise of the Internet will be useful you know [00:27:29] of the Internet will be useful you know an internet was maybe one of the last [00:27:31] an internet was maybe one of the last major technology ways of disruption and [00:27:33] major technology ways of disruption and just as there's a great time to start [00:27:35] just as there's a great time to start working on the Internet maybe 20 years [00:27:37] working on the Internet maybe 20 years ago I think today is a great time [00:27:39] ago I think today is a great time start working on a iot of learning and [00:27:41] start working on a iot of learning and so sir wait to turn on the lights on [00:27:44] so sir wait to turn on the lights on this side as well do I do I control that [00:27:50] okay thank you okay so so I want to show [00:27:54] okay thank you okay so so I want to show you one of the lessons either and really [00:27:56] you one of the lessons either and really spend a lot of time trying to understand [00:27:57] spend a lot of time trying to understand the rise of the internet because I think [00:27:59] the rise of the internet because I think we useful to many of you as you navigate [00:28:01] we useful to many of you as you navigate the rise of machine learning AI in your [00:28:03] the rise of machine learning AI in your upcoming careers as well which is um one [00:28:07] upcoming careers as well which is um one of the lessons I learned was it can take [00:28:09] of the lessons I learned was it can take your favorite shopping mall and build a [00:28:16] your favorite shopping mall and build a website for the shopping mall that does [00:28:19] website for the shopping mall that does not turn your shopping mall into an [00:28:26] not turn your shopping mall into an Internet company right so you know like [00:28:28] Internet company right so you know like my wife like Stanford Shopping Center [00:28:31] my wife like Stanford Shopping Center and I and Stanford Shopping has a [00:28:33] and I and Stanford Shopping has a website but even if you know a great [00:28:35] website but even if you know a great shopping mall sell stuff on the website [00:28:37] shopping mall sell stuff on the website there's a huge difference between a [00:28:39] there's a huge difference between a shopping mall with a website compared to [00:28:41] shopping mall with a website compared to true internet comfy like an Amazon so [00:28:45] true internet comfy like an Amazon so what's the difference about five six six [00:28:48] what's the difference about five six six six seven years ago I was chatting with [00:28:50] six seven years ago I was chatting with the CEO of a very large American [00:28:53] the CEO of a very large American retailer and at that time he and the CIO [00:28:56] retailer and at that time he and the CIO were saying to me they're saying look [00:28:58] were saying to me they're saying look Andrew we have a website we sell things [00:29:01] Andrew we have a website we sell things on the website amazon has a website [00:29:03] on the website amazon has a website Amazon sells things on the website is [00:29:05] Amazon sells things on the website is the same thing but of course it's not [00:29:06] the same thing but of course it's not and today this peculiar large American [00:29:08] and today this peculiar large American retailers you know future existence is [00:29:10] retailers you know future existence is actually a little bit in question Poggi [00:29:12] actually a little bit in question Poggi partly because of Amazon so one of the [00:29:16] partly because of Amazon so one of the lessons I learned really very influenced [00:29:20] lessons I learned really very influenced by Jeff Bezos is that what defines the [00:29:22] by Jeff Bezos is that what defines the Internet company is not just whether you [00:29:24] Internet company is not just whether you have a website instead it is have you [00:29:27] have a website instead it is have you organized your team or your company to [00:29:30] organized your team or your company to do the things that the internet lets you [00:29:32] do the things that the internet lets you do really well for example internet [00:29:34] do really well for example internet teams engage in pervasive maybe testing [00:29:38] teams engage in pervasive maybe testing right we know that we could launch two [00:29:40] right we know that we could launch two versions of a website and just see which [00:29:43] versions of a website and just see which one works better and so we learn much [00:29:44] one works better and so we learn much faster where's a traditional shopping [00:29:46] faster where's a traditional shopping law you can't launch two shopping malls [00:29:48] law you can't launch two shopping malls in two parallel universes and see which [00:29:50] in two parallel universes and see which one works better so you just so much [00:29:52] one works better so you just so much harder to do that we tend to have short [00:29:55] harder to do that we tend to have short shipping times right you can ship a new [00:30:02] shipping times right you can ship a new product every day or every week and so [00:30:04] product every day or every week and so you learn much faster whereas the [00:30:05] you learn much faster whereas the traditional shopping mall may redesign [00:30:08] traditional shopping mall may redesign the shopping mall once per once every [00:30:10] the shopping mall once per once every three months right and we actually [00:30:12] three months right and we actually organize our teams differently we tend [00:30:15] organize our teams differently we tend to push decision-making down to the [00:30:23] to push decision-making down to the engineers or engineers and product [00:30:24] engineers or engineers and product managers because in the traditional [00:30:27] managers because in the traditional shopping mall you know things kind of [00:30:29] shopping mall you know things kind of move slower and maybe the CEO says [00:30:31] move slower and maybe the CEO says something and then everyone just does [00:30:33] something and then everyone just does what the CEO says and that's fine but in [00:30:35] what the CEO says and that's fine but in the Internet era we learned that the [00:30:39] the Internet era we learned that the technology and the users are so [00:30:41] technology and the users are so complicated that only the engineers and [00:30:44] complicated that only the engineers and the product managers for those who don't [00:30:46] the product managers for those who don't know what that is [00:30:47] know what that is are close enough to the technology to [00:30:50] are close enough to the technology to the algorithms and the users to make [00:30:52] the algorithms and the users to make good decisions and so we tend to push [00:30:54] good decisions and so we tend to push decision-making power in Internet [00:30:56] decision-making power in Internet companies down to the engineers so [00:30:58] companies down to the engineers so engineers and product managers and you [00:31:00] engineers and product managers and you have to do that in the Internet era [00:31:02] have to do that in the Internet era because that's how you organize a [00:31:03] because that's how you organize a company or organize a team to do the [00:31:06] company or organize a team to do the things the internet lets you do really [00:31:07] things the internet lets you do really well so I think that was the rise of the [00:31:10] well so I think that was the rise of the internet um I think we've divided the AI [00:31:15] internet um I think we've divided the AI era or AI machine learning or deep [00:31:18] era or AI machine learning or deep learning whether you really call it [00:31:20] learning whether you really call it we're learning that if you have you know [00:31:23] we're learning that if you have you know a traditional company plus a few neural [00:31:27] a traditional company plus a few neural networks that does not by itself [00:31:35] networks that does not by itself turn the company into AI company and I [00:31:39] turn the company into AI company and I think what will define the great AI [00:31:41] think what will define the great AI teams of the future will be do you know [00:31:46] teams of the future will be do you know how to organize your own work and [00:31:48] how to organize your own work and organize your team's work to do the [00:31:51] organize your team's work to do the things that modern you know machine [00:31:53] things that modern you know machine learning and deep learning and other AI [00:31:54] learning and deep learning and other AI things let's you do really well and I [00:31:58] things let's you do really well and I think having many items at Google and [00:32:00] think having many items at Google and Baidu other buyers I think you know [00:32:02] Baidu other buyers I think you know Google and Baidu like great and [00:32:04] Google and Baidu like great and many other countries and thinking the [00:32:05] many other countries and thinking the through but I think even the best [00:32:07] through but I think even the best companies in the world haven't [00:32:08] companies in the world haven't completely figured out what are the [00:32:10] completely figured out what are the principles by which to organize AI teams [00:32:12] principles by which to organize AI teams but I think some of them will be that we [00:32:16] but I think some of them will be that we tend to I think that AI teams tend to be [00:32:20] tend to I think that AI teams tend to be very good at a strategic data [00:32:23] very good at a strategic data acquisition and so you see AI companies [00:32:30] acquisition and so you see AI companies or AI teams even even you know do things [00:32:33] or AI teams even even you know do things that may not seem like it makes sense [00:32:35] that may not seem like it makes sense and why do these companies of all these [00:32:37] and why do these companies of all these three products that don't make any money [00:32:38] three products that don't make any money well some of it is the required data [00:32:40] well some of it is the required data that you can monetize through other ways [00:32:43] that you can monetize through other ways right through advertising or through [00:32:45] right through advertising or through learning about users and so there are a [00:32:47] learning about users and so there are a lot of data acquisition strategies that [00:32:50] lot of data acquisition strategies that at the surface level may not make sense [00:32:52] at the surface level may not make sense but actually do make sense if you [00:32:53] but actually do make sense if you understand how this can be married with [00:32:55] understand how this can be married with deep learning algorithms to create value [00:32:57] deep learning algorithms to create value elsewhere and I think that uh AI [00:33:01] elsewhere and I think that uh AI companies tend to organize data [00:33:04] companies tend to organize data differently right ai teams tend to be [00:33:11] differently right ai teams tend to be very good at putting our data together I [00:33:13] very good at putting our data together I think before the rise of deep learning [00:33:15] think before the rise of deep learning many companies have fragmented data [00:33:17] many companies have fragmented data warehouses where I have a big company if [00:33:20] warehouses where I have a big company if you have 50 different databases you know [00:33:22] you have 50 different databases you know in 50 different divisions it's actually [00:33:24] in 50 different divisions it's actually very difficult for an engineer to look [00:33:26] very difficult for an engineer to look at all those dates and put it together [00:33:27] at all those dates and put it together to train the learning algorithm to do [00:33:29] to train the learning algorithm to do something valuable so the leading AI [00:33:31] something valuable so the leading AI companies tend to have unified data [00:33:34] companies tend to have unified data warehouses and I guess and I know we [00:33:36] warehouses and I guess and I know we have a large home audience or SCPD or [00:33:38] have a large home audience or SCPD or other home audience here so if any of [00:33:40] other home audience here so if any of you work a large tech companies you know [00:33:42] you work a large tech companies you know this is something that that many [00:33:44] this is something that that many companies are investing in today to lay [00:33:46] companies are investing in today to lay the foundation for learning algorithms [00:33:49] the foundation for learning algorithms we tend to be very good at smarting a [00:33:51] we tend to be very good at smarting a pervasive automation opportunities and [00:33:55] pervasive automation opportunities and which is very good at spotting [00:33:56] which is very good at spotting opportunities where you could instead of [00:33:59] opportunities where you could instead of having people do a task of a deep [00:34:00] having people do a task of a deep learning algorithm to at all so I have a [00:34:02] learning algorithm to at all so I have a different thing I offer into a toss and [00:34:04] different thing I offer into a toss and we also have me [00:34:12] we also have me descriptions which I don't have time to [00:34:13] descriptions which I don't have time to talk about but just as book the rise of [00:34:15] talk about but just as book the rise of the internet we started creating a lot [00:34:17] the internet we started creating a lot of new roles for engineers I think [00:34:20] of new roles for engineers I think actually once upon the time the world [00:34:21] actually once upon the time the world was simple and there was just a software [00:34:23] was simple and there was just a software engineering title but as technology [00:34:26] engineering title but as technology gotten got more complicated we started [00:34:29] gotten got more complicated we started to specialize so that's why you know [00:34:31] to specialize so that's why you know with the Internet where front end back [00:34:32] with the Internet where front end back and mobile right and then we have you [00:34:36] and mobile right and then we have you know and then with increasingly other [00:34:39] know and then with increasingly other roles right cue a DevOps IT move into [00:34:41] roles right cue a DevOps IT move into increased specialization of knowledge [00:34:43] increased specialization of knowledge and so what the rise of machine learning [00:34:46] and so what the rise of machine learning we're starting the creation of new roles [00:34:48] we're starting the creation of new roles like machine learning engineer research [00:34:50] like machine learning engineer research machine learning research scientists and [00:34:53] machine learning research scientists and our product managers and AI teams also [00:34:55] our product managers and AI teams also behave differently than proper managers [00:34:57] behave differently than proper managers and internet companies and so one of the [00:35:00] and internet companies and so one of the things we'll revisit a few times [00:35:02] things we'll revisit a few times throughout this quarter is and I don't [00:35:04] throughout this quarter is and I don't mean to to corporate I know that many of [00:35:06] mean to to corporate I know that many of you are you know some of the SUV the [00:35:08] you are you know some of the SUV the audience or online orders already [00:35:10] audience or online orders already working company many of you when you [00:35:11] working company many of you when you graduate from Stanford we end up maybe [00:35:13] graduate from Stanford we end up maybe starting your own company or joining an [00:35:15] starting your own company or joining an existing company but I think that [00:35:18] existing company but I think that solving a lot of these questions of how [00:35:20] solving a lot of these questions of how to organize your team's effectively in [00:35:22] to organize your team's effectively in the AI error will help you do more [00:35:24] the AI error will help you do more valuable work and I think to make one [00:35:30] valuable work and I think to make one more analogy you know I think that one [00:35:32] more analogy you know I think that one of the things I hope Ken and I will [00:35:34] of the things I hope Ken and I will share of you throughout this quarter is [00:35:37] share of you throughout this quarter is just as in the software engineering [00:35:39] just as in the software engineering world it took us a long time to figure [00:35:42] world it took us a long time to figure out what is agile development right or [00:35:45] out what is agile development right or whether the pros and cons of you know [00:35:47] whether the pros and cons of you know waterfall model versus agile or how do [00:35:50] waterfall model versus agile or how do you what does a strum process right oh [00:35:53] you what does a strum process right oh this is code review a good idea it seems [00:35:55] this is code review a good idea it seems a good idea to me right it's but this [00:35:57] a good idea to me right it's but this these practices after after program [00:36:01] these practices after after program languages were created or invented or [00:36:03] languages were created or invented or whatever we still had to figure all [00:36:05] whatever we still had to figure all these ways to help individuals and teams [00:36:07] these ways to help individuals and teams write software effectively and so if you [00:36:10] write software effectively and so if you worked in you know high-performing [00:36:12] worked in you know high-performing corporate industrial AI teams using [00:36:15] corporate industrial AI teams using these software engineering practices [00:36:16] these software engineering practices there is an Co review to agile to [00:36:19] there is an Co review to agile to whatever you know you know that having a [00:36:21] whatever you know you know that having a team work effectively to write software [00:36:23] team work effectively to write software is more than [00:36:24] is more than everyone knowing C++ syntax are [00:36:26] everyone knowing C++ syntax are wondering Python syntax and I think in [00:36:29] wondering Python syntax and I think in the machine learning world we're still [00:36:31] the machine learning world we're still in the process of inventing these types [00:36:35] in the process of inventing these types of processes [00:36:35] of processes what is the strum what does the agile [00:36:37] what is the strum what does the agile development what's the equivalent of [00:36:38] development what's the equivalent of code review for developing machine [00:36:40] code review for developing machine learning algorithms and I think probably [00:36:43] learning algorithms and I think probably this class more than more than this [00:36:46] this class more than more than this class and machine learning earning more [00:36:49] class and machine learning earning more than any other easels I'm aware of right [00:36:51] than any other easels I'm aware of right now I think we'll try to systematically [00:36:53] now I think we'll try to systematically teach you these tools so that you don't [00:36:56] teach you these tools so that you don't just are able to derive a learning [00:36:58] just are able to derive a learning algorithm and implement a learning [00:37:00] algorithm and implement a learning algorithm but that you're actually you [00:37:02] algorithm but that you're actually you know very effective in terms of how you [00:37:05] know very effective in terms of how you go about building these systems so last [00:37:10] go about building these systems so last thing before I pass it can is on the [00:37:16] thing before I pass it can is on the other question that I've been asked I [00:37:18] other question that I've been asked I guess several times this week now that [00:37:21] guess several times this week now that just pre-emptive the answer is a so [00:37:25] just pre-emptive the answer is a so there are multiple machine learning [00:37:28] there are multiple machine learning classes going on at Stanford this [00:37:29] classes going on at Stanford this quarter so the other frequently asked [00:37:31] quarter so the other frequently asked question is which of these classes [00:37:33] question is which of these classes should you take so let me just address [00:37:35] should you take so let me just address that preemptively before someone else [00:37:37] that preemptively before someone else maybe because I've been asked twice [00:37:38] maybe because I've been asked twice already and the other two classes is [00:37:40] already and the other two classes is quarter so I think actually what what's [00:37:43] quarter so I think actually what what's happened over the last several years the [00:37:45] happened over the last several years the standard is the demand for machine [00:37:47] standard is the demand for machine learning education has you know been [00:37:49] learning education has you know been rising dramatically because I mean years [00:37:53] rising dramatically because I mean years the majority of CS PhD applicants in [00:37:56] the majority of CS PhD applicants in Stanford you know are applying to do [00:37:58] Stanford you know are applying to do work and machine learning or applying to [00:38:00] work and machine learning or applying to do work in AI and I think all of you can [00:38:02] do work in AI and I think all of you can kind of see that there's such a shortage [00:38:05] kind of see that there's such a shortage of machine learning engineers right and [00:38:07] of machine learning engineers right and then there's a little bit of and and and [00:38:09] then there's a little bit of and and and I think that shortage should continue [00:38:10] I think that shortage should continue for a long time so I think many people [00:38:11] for a long time so I think many people see that if you can explain the machine [00:38:13] see that if you can explain the machine learning there'll be great opportunities [00:38:14] learning there'll be great opportunities for you to do meaningful work on campus [00:38:17] for you to do meaningful work on campus to take machine learning to compile or [00:38:19] to take machine learning to compile or cosmology on the can train or do great [00:38:21] cosmology on the can train or do great research on campus as well as Brad from [00:38:23] research on campus as well as Brad from Stanford and do very unique work when I [00:38:27] Stanford and do very unique work when I wonder around Silicon Valley I feel like [00:38:29] wonder around Silicon Valley I feel like there are so many ideas for great [00:38:30] there are so many ideas for great machine learning projects that exactly [00:38:33] machine learning projects that exactly zero people see it through working on [00:38:34] zero people see it through working on because [00:38:35] because just aren't enough machine learning [00:38:36] just aren't enough machine learning people in the world right now so by [00:38:38] people in the world right now so by learning these skills you could you have [00:38:40] learning these skills you could you have many opportunities to be the first one [00:38:42] many opportunities to be the first one to do something very exciting and [00:38:44] to do something very exciting and meaningful right alright and and um you [00:38:47] meaningful right alright and and um you probably read in the newspapers about [00:38:48] probably read in the newspapers about how much money machine there any people [00:38:50] how much money machine there any people make I'm actually much less more I [00:38:51] make I'm actually much less more I actually find that I hope a lot you make [00:38:53] actually find that I hope a lot you make a lot of money but I actually personally [00:38:54] a lot of money but I actually personally don't find out that you know as exciting [00:38:57] don't find out that you know as exciting I think that every time there's a major [00:38:59] I think that every time there's a major technological disruption it gives us an [00:39:02] technological disruption it gives us an opportunity to remove large parts of the [00:39:04] opportunity to remove large parts of the world and I hope that as some of you go [00:39:06] world and I hope that as some of you go improve a health care system improving [00:39:08] improve a health care system improving educational system maybe you know see if [00:39:10] educational system maybe you know see if we can help preserve the smooth [00:39:11] we can help preserve the smooth functioning of democracy around the [00:39:13] functioning of democracy around the world I think that it really your unique [00:39:15] world I think that it really your unique skills and deep learning will give you [00:39:16] skills and deep learning will give you opportunities to do that I think [00:39:18] opportunities to do that I think hopefully very meaningful work um but [00:39:22] hopefully very meaningful work um but because of this massive massive rising [00:39:25] because of this massive massive rising demand for machine learning education [00:39:27] demand for machine learning education there are some for long time CS 239 [00:39:30] there are some for long time CS 239 machine learning was the core machine [00:39:32] machine learning was the core machine learning class at Stanford's and then CS [00:39:36] learning class at Stanford's and then CS 230 is actually the newest new creation [00:39:40] 230 is actually the newest new creation I think and the other costs that were [00:39:43] I think and the other costs that were involved in that units and I are [00:39:45] involved in that units and I are involved in this quarter is CS 229 a so [00:39:50] involved in this quarter is CS 229 a so so if China decide which of these [00:39:52] so if China decide which of these classes to take I think I think that [00:39:55] classes to take I think I think that these classes a little bit like Pokemon [00:39:57] these classes a little bit like Pokemon right you really should collect them all [00:40:00] right you really should collect them all but better but I think we've been trying [00:40:03] but better but I think we've been trying to design these classes to actually [00:40:05] to design these classes to actually teach different things and not have too [00:40:07] teach different things and not have too much overlap and so there is so I have [00:40:13] much overlap and so there is so I have seen students take two classes at the [00:40:15] seen students take two classes at the same time and that's actually fine [00:40:16] same time and that's actually fine there's not that we have old lab is fine [00:40:19] there's not that we have old lab is fine that you actually learn different things [00:40:21] that you actually learn different things if you take any two of these classes at [00:40:22] if you take any two of these classes at the same time 69 is machine learning is [00:40:26] the same time 69 is machine learning is the most mathematical of these classes [00:40:27] the most mathematical of these classes and we go much more since through now it [00:40:30] and we go much more since through now it goes much more into the mathematical [00:40:31] goes much more into the mathematical derivations of the algorithms CS 229 a [00:40:35] derivations of the algorithms CS 229 a is applying machine learning is much [00:40:37] is applying machine learning is much less mathematical but spends a bit more [00:40:39] less mathematical but spends a bit more time on the practical aspects is [00:40:42] time on the practical aspects is actually easier on Brown to machine [00:40:44] actually easier on Brown to machine learning as well as the least [00:40:45] learning as well as the least mathematical of [00:40:46] mathematical of classes CS 230 is somewhere in between [00:40:48] classes CS 230 is somewhere in between this is Monmouth Alden 69 a less Matt [00:40:52] this is Monmouth Alden 69 a less Matt Matthews SCS 230 but where CSU 30 [00:40:55] Matthews SCS 230 but where CSU 30 focuses on is a deep learning which is [00:40:58] focuses on is a deep learning which is just one small subset of machine [00:41:00] just one small subset of machine learning but it is the hottest subset of [00:41:02] learning but it is the hottest subset of machine learning whereas there are a lot [00:41:03] machine learning whereas there are a lot of other machine learning algorithms [00:41:05] of other machine learning algorithms from your PCA k-means recommender [00:41:07] from your PCA k-means recommender systems support vector machines that are [00:41:10] systems support vector machines that are also very useful that I use you know in [00:41:12] also very useful that I use you know in my work quite frequently that we don't [00:41:14] my work quite frequently that we don't teach in CS 230 but then stored in C [00:41:16] teach in CS 230 but then stored in C it's two three nine six four two two [00:41:18] it's two three nine six four two two nine eight oh we're so the unique things [00:41:21] nine eight oh we're so the unique things about C's 230 is it focuses on deep [00:41:23] about C's 230 is it focuses on deep learning so I'll know if you wanted list [00:41:25] learning so I'll know if you wanted list deep learning on your resume I guess [00:41:26] deep learning on your resume I guess maybe this is the the easiest way to do [00:41:28] maybe this is the the easiest way to do it I don't know again it's not what I [00:41:30] it I don't know again it's not what I tend to optimize for but but and I think [00:41:32] tend to optimize for but but and I think CS 230 goes to D pers in their practical [00:41:36] CS 230 goes to D pers in their practical know-how and how to apply these [00:41:37] know-how and how to apply these algorithms oh and so just and I want to [00:41:41] algorithms oh and so just and I want to set expectations accurately as well [00:41:43] set expectations accurately as well right so what I don't want is that you [00:41:46] right so what I don't want is that you guys did complain in the other quarter [00:41:47] guys did complain in the other quarter that you know there wasn't enough math [00:41:49] that you know there wasn't enough math because that's actually not the point [00:41:51] because that's actually not the point what has happened in the last decade is [00:41:53] what has happened in the last decade is the amount of math you need to be a [00:41:55] the amount of math you need to be a great machine learning person has [00:41:57] great machine learning person has actually decreased I think and I wanted [00:42:00] actually decreased I think and I wanted to do less math and CS 2:30 but spend [00:42:03] to do less math and CS 2:30 but spend more time teaching you the practical [00:42:06] more time teaching you the practical know-how of how to actually apply these [00:42:07] know-how of how to actually apply these algorithms right so yeah and I think to [00:42:11] algorithms right so yeah and I think to 2000 a is very the easier this cause [00:42:13] 2000 a is very the easier this cause that's the most technical this is the [00:42:15] that's the most technical this is the most most hands-on apply you do a lot of [00:42:17] most most hands-on apply you do a lot of projects on different different topics [00:42:19] projects on different different topics right and I think these courses are [00:42:21] right and I think these courses are often the foundation or some subset that [00:42:23] often the foundation or some subset that these are often the foundational courses [00:42:24] these are often the foundational courses as students say because if you say learn [00:42:28] as students say because if you say learn deep learning some common sequence first [00:42:30] deep learning some common sequence first student sister you know learn the [00:42:32] student sister you know learn the foundations of machine learning or [00:42:34] foundations of machine learning or [Music] [00:42:37] machine learning of deep learning and so [00:42:40] machine learning of deep learning and so you have the foundation first before you [00:42:43] you have the foundation first before you go which then often sets you up to later [00:42:45] go which then often sets you up to later go deeper into computer vision or [00:42:48] go deeper into computer vision or national Enders processing or robotics [00:42:50] national Enders processing or robotics or deep reinforcement learning and so [00:42:52] or deep reinforcement learning and so common sequencing that common tactic [00:42:54] common sequencing that common tactic that Stanford says take is to use these [00:42:57] that Stanford says take is to use these in the foundation you see a bit of every [00:42:59] in the foundation you see a bit of every thing from divisions national processing [00:43:01] thing from divisions national processing in speech recognition you know touch low [00:43:03] in speech recognition you know touch low bill on self-driving cars but that gives [00:43:05] bill on self-driving cars but that gives you the foundation to then decide you [00:43:07] you the foundation to then decide you want to go deeper into the National [00:43:08] want to go deeper into the National image processing or robotics or [00:43:10] image processing or robotics or enforcement learning or computer vision [00:43:12] enforcement learning or computer vision or something else this is common [00:43:13] or something else this is common sequencing of classes that students take [00:43:16] sequencing of classes that students take ok so um look forward to spending this [00:43:20] ok so um look forward to spending this quarter with you let me just check out [00:43:22] quarter with you let me just check out any quick questions and you're not hand [00:43:24] any quick questions and you're not hand it over takes you into a decision making [00:43:33] it over takes you into a decision making by engineers and product managers you [00:43:36] by engineers and product managers you may be pushing decision making I wrote [00:43:38] may be pushing decision making I wrote this room aching by engineers there but [00:43:40] this room aching by engineers there but really engineers and probably managers [00:43:42] really engineers and probably managers Oh a pervasive old sorry a pervasive [00:43:45] Oh a pervasive old sorry a pervasive automation so to say that like like so [00:44:00] automation so to say that like like so far you like what what are the most like [00:44:04] far you like what what are the most like the most meaningful successes of machine [00:44:06] the most meaningful successes of machine learning that you think have happened [00:44:10] learning that you think have happened already so all of you are using learning [00:44:12] already so all of you are using learning algorithms probably dozens of times a [00:44:14] algorithms probably dozens of times a day maybe even hundreds of times a day [00:44:15] day maybe even hundreds of times a day without knowing it right every time you [00:44:17] without knowing it right every time you use a web search engine there's a [00:44:19] use a web search engine there's a learning algorithm that's improving the [00:44:21] learning algorithm that's improving the quality of search results there's also [00:44:22] quality of search results there's also learning however trying to show you the [00:44:24] learning however trying to show you the most relevant as and this helps those [00:44:25] most relevant as and this helps those companies actually make it all the money [00:44:27] companies actually make it all the money every time it turns out that actually [00:44:30] every time it turns out that actually both Google and Baidu have publicly said [00:44:33] both Google and Baidu have publicly said that over ten percent of searches on [00:44:34] that over ten percent of searches on mobile are through voice search and so I [00:44:37] mobile are through voice search and so I think it's great that you can now talk [00:44:39] think it's great that you can now talk to your cell phone rather than typed on [00:44:40] to your cell phone rather than typed on the tiny little keyboard if you wanna do [00:44:42] the tiny little keyboard if you wanna do a do a web search and mobile if you go [00:44:45] a do a web search and mobile if you go to you know website like Amazon or [00:44:48] to you know website like Amazon or Netflix or [00:44:50] Netflix or there are learning algorithms [00:44:51] there are learning algorithms recommending more rubber movies on the [00:44:53] recommending more rubber movies on the more relevant products to you every time [00:44:55] more relevant products to you every time you use your credit cards there's a [00:44:57] you use your credit cards there's a learning algorithm kind of probably from [00:45:00] learning algorithm kind of probably from almost all companies I'm aware of does [00:45:02] almost all companies I'm aware of does the learning algorithm kind of figure [00:45:03] the learning algorithm kind of figure out if it's you using a credit card or [00:45:05] out if it's you using a credit card or if it's been stolen so they should so [00:45:07] if it's been stolen so they should so they should you know there's a lava see [00:45:09] they should you know there's a lava see if it's a fraudulent transaction or not [00:45:10] if it's a fraudulent transaction or not every time you open up your email [00:45:12] every time you open up your email the only reason email is even usable [00:45:14] the only reason email is even usable it's because of your spam filter which [00:45:16] it's because of your spam filter which is because of learning algorithm they're [00:45:17] is because of learning algorithm they're worse much better now than then before I [00:45:20] worse much better now than then before I don't know I and so there's a III think [00:45:25] don't know I and so there's a III think you know one of the amazing things about [00:45:27] you know one of the amazing things about AI machine learning is I love it when it [00:45:29] AI machine learning is I love it when it disappears in the background right you [00:45:31] disappears in the background right you you you use your you know you use these [00:45:33] you you use your you know you use these algorithms you boot up your map [00:45:35] algorithms you boot up your map application and it finds the shortest [00:45:36] application and it finds the shortest route for you to drive from here to [00:45:38] route for you to drive from here to there and there's a learning algorithm [00:45:40] there and there's a learning algorithm predicting what traffic will be like on [00:45:41] predicting what traffic will be like on highway 101 one hour from now but you [00:45:45] highway 101 one hour from now but you don't even need to think that there was [00:45:46] don't even need to think that there was a learning algorithm trying to figure [00:45:47] a learning algorithm trying to figure out what traffic will be like one are in [00:45:49] out what traffic will be like one are in the future seems pretty magical right [00:45:51] the future seems pretty magical right that you know but but that you could [00:45:54] that you know but but that you could just use it this you can but all these [00:45:56] just use it this you can but all these wonderful products and systems that help [00:45:58] wonderful products and systems that help people but abstract away a lot of [00:46:00] people but abstract away a lot of details so that's the present and I [00:46:02] details so that's the present and I think in the future near future [00:46:04] think in the future near future most of my PhD students most my research [00:46:07] most of my PhD students most my research group here is work on machine learning [00:46:08] group here is work on machine learning for healthcare I think that will have [00:46:10] for healthcare I think that will have significant inroads you know and my team [00:46:16] significant inroads you know and my team at landing a I spent a long time with a [00:46:18] at landing a I spent a long time with a lot of industries for manufacturing the [00:46:19] lot of industries for manufacturing the agriculture so other things I'm excited [00:46:22] agriculture so other things I'm excited about machine learning for education [00:46:24] about machine learning for education keep people precise to tell how people [00:46:27] keep people precise to tell how people recommended crew size contents this [00:46:30] recommended crew size contents this fascinating research done here at [00:46:31] fascinating research done here at Stanford by Chris peach and a few others [00:46:33] Stanford by Chris peach and a few others on using running errands to give people [00:46:35] on using running errands to give people feedback on coding homework assignments [00:46:38] feedback on coding homework assignments I so so sorry there's so many examples [00:46:41] I so so sorry there's so many examples of machine learning I could talk for [00:46:42] of machine learning I could talk for quite some time [00:46:43] quite some time yeah alright one last question hand over [00:46:46] yeah alright one last question hand over cancer yeah the C so the so the format [00:46:54] cancer yeah the C so the so the format of the class is that you watch videos [00:46:58] of the class is that you watch videos created by people and AI [00:47:01] created by people and AI Oh Sarah so you see me a lot there but [00:47:03] Oh Sarah so you see me a lot there but in addition Kiran and I will be having [00:47:07] in addition Kiran and I will be having lectures here in this classroom every [00:47:09] lectures here in this classroom every Wednesday and that will be you know [00:47:11] Wednesday and that will be you know completely new material that is not [00:47:15] completely new material that is not online anywhere at leas right now yeah [00:47:18] online anywhere at leas right now yeah yeah and then also that I think the the [00:47:21] yeah and then also that I think the the point the point that the flipped [00:47:22] point the point that the flipped classroom saying really is some of the [00:47:24] classroom saying really is some of the things is really more time efficient for [00:47:25] things is really more time efficient for you to just learn online so there's the [00:47:27] you to just learn online so there's the online content but what it does is it [00:47:29] online content but what it does is it leaves this classroom time for us and [00:47:31] leaves this classroom time for us and not you know deliver the same lecture [00:47:33] not you know deliver the same lecture year after year but to get other get [00:47:35] year after year but to get other get spend time to get to know you on they [00:47:36] spend time to get to know you on they have more time answer your questions and [00:47:39] have more time answer your questions and also give you more in cost practice on [00:47:41] also give you more in cost practice on these things right so there's the [00:47:43] these things right so there's the Coursera the vendor course our content [00:47:45] Coursera the vendor course our content but what what we do in CS 230 is to [00:47:47] but what what we do in CS 230 is to augment that to give you much deeper [00:47:49] augment that to give you much deeper practice more advanced examples some [00:47:52] practice more advanced examples some more deeper mathematical derivations and [00:47:54] more deeper mathematical derivations and more practice so you know you deepen [00:47:56] more practice so you know you deepen your knowledge of that and with that let [00:47:59] your knowledge of that and with that let me hand it over to the Ken [00:48:14] yeah I'm gonna get back at him by making [00:48:16] yeah I'm gonna get back at him by making noise while he's talking okay okay [00:48:27] noise while he's talking okay okay thanks Andrew hi everyone I'm John we're [00:48:32] thanks Andrew hi everyone I'm John we're excited to have you here today those of [00:48:35] excited to have you here today those of you who are in class but also those of [00:48:36] you who are in class but also those of you who are CPD students we wanted to [00:48:39] you who are CPD students we wanted to take a little more time to explain a [00:48:41] take a little more time to explain a little bit about the course logistics [00:48:43] little bit about the course logistics what this course is about and also what [00:48:46] what this course is about and also what it is to be a CS 230 students in fall [00:48:50] it is to be a CS 230 students in fall two thousand eighteen so the course [00:48:53] two thousand eighteen so the course online is structured into five chapters [00:48:57] online is structured into five chapters or sub courses let's say what we will [00:49:00] or sub courses let's say what we will teach you first is what is the neural [00:49:02] teach you first is what is the neural you need to know that's after [00:49:04] you need to know that's after understanding what a neuron is you're [00:49:06] understanding what a neuron is you're going to build layers with these neurons [00:49:08] going to build layers with these neurons you're then going to stack these layers [00:49:10] you're then going to stack these layers on top of each other to build a network [00:49:12] on top of each other to build a network that can be small or deep this is the [00:49:15] that can be small or deep this is the first course unfortunately it's not [00:49:18] first course unfortunately it's not enough to deploy a network just just [00:49:22] enough to deploy a network just just building a neural network is not enough [00:49:24] building a neural network is not enough to get it to work so in the second [00:49:26] to get it to work so in the second course we're going to teach you the [00:49:27] course we're going to teach you the methods that are used to tune this [00:49:30] methods that are used to tune this network in order to improve their [00:49:32] network in order to improve their performances this is the second part as [00:49:36] performances this is the second part as Andrew mentioned one thing we're really [00:49:40] Andrew mentioned one thing we're really putting a huge emphasis on in CS 230 is [00:49:44] putting a huge emphasis on in CS 230 is the industrial applications and how the [00:49:47] the industrial applications and how the industry works in AI so the third course [00:49:49] industry works in AI so the third course is going to help you understand how to [00:49:51] is going to help you understand how to strategize your project that you'll do [00:49:53] strategize your project that you'll do to file the quarter but also in general [00:49:55] to file the quarter but also in general how do a AI teams work you can have an [00:49:57] how do a AI teams work you can have an algorithm you have to identify why does [00:50:00] algorithm you have to identify why does the algorithm work why does it not work [00:50:02] the algorithm work why does it not work and if it doesn't work what are the [00:50:04] and if it doesn't work what are the parts that you should improve inside the [00:50:06] parts that you should improve inside the algorithm the two last courses course [00:50:09] algorithm the two last courses course fourth of course four and five are [00:50:10] fourth of course four and five are focusing on two fields that are defined [00:50:14] focusing on two fields that are defined by two types of algorithms first [00:50:16] by two types of algorithms first convolutional neural networks that have [00:50:18] convolutional neural networks that have been proven to work very well on imaging [00:50:21] been proven to work very well on imaging or videos and on the other hand sequence [00:50:24] or videos and on the other hand sequence models that include also recurrent [00:50:26] models that include also recurrent neural networks [00:50:27] neural networks that applied the lots in natural [00:50:29] that applied the lots in natural language processing or speech [00:50:31] language processing or speech recognition so you're going to see all [00:50:34] recognition so you're going to see all that from the online perspective we use [00:50:39] that from the online perspective we use a specific notation in CS 2:30 so when I [00:50:41] a specific notation in CS 2:30 so when I will say C 2 M 3 it refers to course to [00:50:45] will say C 2 M 3 it refers to course to module 3 [00:50:46] module 3 so the third module of improving deep [00:50:49] so the third module of improving deep neural networks okay [00:50:51] neural networks okay and I'd like everyone to go on the [00:50:53] and I'd like everyone to go on the website see a 230 syllabus after the [00:50:55] website see a 230 syllabus after the class to look at all the syllabus [00:50:58] class to look at all the syllabus throughout the quarter check when the [00:50:59] throughout the quarter check when the midterm is and when the final poster [00:51:02] midterm is and when the final poster presentation is the schedule is posted [00:51:05] presentation is the schedule is posted there so check it out and we're going to [00:51:08] there so check it out and we're going to use the Coursera platform as you know so [00:51:11] use the Coursera platform as you know so on Coursera you will receive an invite [00:51:14] on Coursera you will receive an invite on your Stanford email and you should [00:51:17] on your Stanford email and you should have received it already for course one [00:51:19] have received it already for course one in order to access the platform from the [00:51:23] in order to access the platform from the platform you will be able to watch [00:51:24] platform you will be able to watch videos do quizzes and do programming [00:51:26] videos do quizzes and do programming assignments and every time we finish one [00:51:28] assignments and every time we finish one of these courses so c1 has four modules [00:51:30] of these courses so c1 has four modules when you're at c1 m4 you will receive a [00:51:33] when you're at c1 m4 you will receive a new invite to access c2 and so on okay [00:51:37] new invite to access c2 and so on okay inside CS 2:30 we're going to use Casa [00:51:40] inside CS 2:30 we're going to use Casa as a class forum for you to interact [00:51:42] as a class forum for you to interact with the TAS and with the instructors [00:51:44] with the TAS and with the instructors you can post privately or publicly [00:51:46] you can post privately or publicly depending on the matter okay so let's [00:51:51] depending on the matter okay so let's see what it is to be one week in the [00:51:54] see what it is to be one week in the life of a CSC 230 students so we're [00:51:56] life of a CSC 230 students so we're going to do ten times that over this the [00:51:58] going to do ten times that over this the Fall Quarter so what is one module in a [00:52:01] Fall Quarter so what is one module in a module you will watch about ten videos [00:52:04] module you will watch about ten videos on Coursera which will be about one hour [00:52:06] on Coursera which will be about one hour and a half you will do quizzes after [00:52:09] and a half you will do quizzes after watching the videos this is going to [00:52:12] watching the videos this is going to take you about 20 minutes per module and [00:52:14] take you about 20 minutes per module and finally you will complete programming [00:52:16] finally you will complete programming assignments which are on Jupiter [00:52:18] assignments which are on Jupiter notebooks you will get cells to test [00:52:22] notebooks you will get cells to test your code and also submit your code [00:52:23] your code and also submit your code directly on the Coursera platform in one [00:52:27] directly on the Coursera platform in one week of class in Stanford here we will [00:52:29] week of class in Stanford here we will have two modules usually on top of these [00:52:33] have two modules usually on top of these two modules you will come to lecture for [00:52:36] two modules you will come to lecture for a 1 hour and a half in class lecture on [00:52:40] a 1 hour and a half in class lecture on an advanced top [00:52:40] an advanced top that is not taught online and after that [00:52:43] that is not taught online and after that you will have ta sections on Fridays [00:52:46] you will have ta sections on Fridays that are around one hour and it's a good [00:52:48] that are around one hour and it's a good chance for you to meet other students [00:52:50] chance for you to meet other students for your projects and also to interact [00:52:52] for your projects and also to interact with the TAS directly finally we have [00:52:55] with the TAS directly finally we have also personalised monitor ship this [00:52:58] also personalised monitor ship this quarter where every one of you will meet [00:53:00] quarter where every one of you will meet 15 minutes per week with the TA in order [00:53:03] 15 minutes per week with the TA in order to check in on your projects and give [00:53:05] to check in on your projects and give you the next steps so we put your huge [00:53:07] you the next steps so we put your huge emphasis on the project in this class [00:53:09] emphasis on the project in this class and we want you you will see it later to [00:53:11] and we want you you will see it later to build to the side of your teams by this [00:53:13] build to the side of your teams by this Friday in order to get started as soon [00:53:16] Friday in order to get started as soon as possible next week you will have your [00:53:18] as possible next week you will have your first mentorship meeting with the TAS [00:53:22] first mentorship meeting with the TAS okay it's gonna be fun [00:53:25] okay it's gonna be fun assignment and quizzes that are part of [00:53:28] assignment and quizzes that are part of modules are due every Wednesday at 11 [00:53:31] modules are due every Wednesday at 11 a.m. so 30 minutes before class so you [00:53:34] a.m. so 30 minutes before class so you can come to class with everything done [00:53:35] can come to class with everything done and understand it and do not follow the [00:53:39] and understand it and do not follow the deadlines displayed on the Coursera [00:53:41] deadlines displayed on the Coursera platform follow the deadlines posted on [00:53:44] platform follow the deadlines posted on the CS 230 website the reason the [00:53:46] the CS 230 website the reason the deadlines are different is because we [00:53:48] deadlines are different is because we want to allow you to have late days and [00:53:49] want to allow you to have late days and Coursera was not built for late days so [00:53:52] Coursera was not built for late days so we we put the deadlines later on on [00:53:54] we we put the deadlines later on on Coursera to allow you to submit even if [00:53:56] Coursera to allow you to submit even if you want to use a late day does that [00:53:58] you want to use a late day does that make sense okay [00:54:00] make sense okay so we're also using a kind of [00:54:03] so we're also using a kind of interactive this this is gonna start [00:54:05] interactive this this is gonna start course to we we will use an interactive [00:54:08] course to we we will use an interactive tool that is called multimeter to check [00:54:12] tool that is called multimeter to check in attendance in class and also for you [00:54:14] in attendance in class and also for you to answer some interactive questions so [00:54:17] to answer some interactive questions so it's gonna start next next week sorry [00:54:20] it's gonna start next next week sorry not course tone [00:54:22] not course tone regarding the grading formula here it is [00:54:26] regarding the grading formula here it is so you have a small part on attendance [00:54:28] so you have a small part on attendance that is two percent of the final grade [00:54:30] that is two percent of the final grade eight percent on quizzes 25 percent on [00:54:33] eight percent on quizzes 25 percent on programming assignments and big part on [00:54:36] programming assignments and big part on the midterm and on the final projects so [00:54:40] the midterm and on the final projects so this is posted on the website if you [00:54:42] this is posted on the website if you want to check it attendance is taken for [00:54:45] want to check it attendance is taken for in-class lectures for 15 minutes CIA [00:54:49] in-class lectures for 15 minutes CIA meetings and for the TA sections on [00:54:51] meetings and for the TA sections on Friday you can have a bonus and [00:54:54] Friday you can have a bonus and we've had students very active on on [00:54:56] we've had students very active on on Casa that answered questions to other [00:54:59] Casa that answered questions to other students which was great and they got a [00:55:00] students which was great and they got a bonus so I encourage you to do the same [00:55:03] bonus so I encourage you to do the same maybe we don't need TAS and instructors [00:55:05] maybe we don't need TAS and instructors there okay so I wanted to take a little [00:55:11] there okay so I wanted to take a little more time to go over some of the [00:55:13] more time to go over some of the programming assignments that you're [00:55:15] programming assignments that you're going to do this quarter so that you you [00:55:18] going to do this quarter so that you you know where you're going in about three [00:55:21] know where you're going in about three weeks from how you're going to be able [00:55:23] weeks from how you're going to be able to translate these pictures here in the [00:55:26] to translate these pictures here in the numbers that they corresponds to in in [00:55:28] numbers that they corresponds to in in sign languages so it's sign language [00:55:29] sign languages so it's sign language trust translation from images to the [00:55:33] trust translation from images to the output signification you're going to [00:55:37] output signification you're going to build a convolutional neural network and [00:55:39] build a convolutional neural network and the first villages segregation and then [00:55:41] the first villages segregation and then a convolutional neural network in order [00:55:42] a convolutional neural network in order to solve this problem a little later [00:55:47] to solve this problem a little later you're going to be a deep learning [00:55:50] you're going to be a deep learning engineer in a house that is not too far [00:55:52] engineer in a house that is not too far from here called the happy house so [00:55:55] from here called the happy house so there's only one rule in this house and [00:55:57] there's only one rule in this house and the rule is that no sad person should [00:56:00] the rule is that no sad person should enter the house should avoid that and [00:56:02] enter the house should avoid that and because you're the only deep learning [00:56:04] because you're the only deep learning engine that has the knowledge you're [00:56:05] engine that has the knowledge you're given this task which is don't let these [00:56:07] given this task which is don't let these sad people in just let happy people in [00:56:09] sad people in just let happy people in and you're going to build a network that [00:56:13] and you're going to build a network that will run on a camera that is in front of [00:56:15] will run on a camera that is in front of the house and that is going to let [00:56:17] the house and that is going to let people in or not and unfortunately some [00:56:19] people in or not and unfortunately some people will not get in and other people [00:56:21] people will not get in and other people will will get in because they're they're [00:56:23] will will get in because they're they're happy and you will save the happy house [00:56:26] happy and you will save the happy house at the end of the assignment hopefully [00:56:29] at the end of the assignment hopefully this is one of the applications of the [00:56:34] this is one of the applications of the permeate I that I personally prefer its [00:56:36] permeate I that I personally prefer its called object detection you might have [00:56:39] called object detection you might have heard of it so this is running real time [00:56:41] heard of it so this is running real time and that's what is very pressing you're [00:56:43] and that's what is very pressing you're going to work on a deep learning [00:56:46] going to work on a deep learning architecture called Yolo v2 and Yolo v2 [00:56:50] architecture called Yolo v2 and Yolo v2 is an object detection algorithm that [00:56:52] is an object detection algorithm that runs real time and is able to detect [00:56:54] runs real time and is able to detect 9000 objects as fast as that so it's [00:56:58] 9000 objects as fast as that so it's it's really really impressive you have a [00:57:00] it's really really impressive you have a few links here if you want to check the [00:57:01] few links here if you want to check the paper already but maybe you will need [00:57:03] paper already but maybe you will need some weights to understand [00:57:07] okay actually we have a we can even run [00:57:15] okay actually we have a we can even run it directly on my computer I think she's [00:57:17] it directly on my computer I think she's going to be fine [00:57:32] oh yeah we can run it so here you see [00:57:35] oh yeah we can run it so here you see it's running live on this computer and [00:57:38] it's running live on this computer and so you see that if I move it will find [00:57:40] so you see that if I move it will find out that I move so I cannot escape yeah [00:57:43] out that I move so I cannot escape yeah here it is okay okay a few other [00:57:52] here it is okay okay a few other projects one two weeks from now you will [00:57:56] projects one two weeks from now you will build an optimal goalkeeper shoot [00:57:57] build an optimal goalkeeper shoot prediction so in soccer you're a [00:57:59] prediction so in soccer you're a goalkeeper and you want to decide where [00:58:01] goalkeeper and you want to decide where you should shoot the ball in order to [00:58:03] you should shoot the ball in order to make it land on one of your teammates [00:58:05] make it land on one of your teammates you're going to find what's the exact [00:58:08] you're going to find what's the exact line on the field which tells the [00:58:09] line on the field which tells the goalkeeper were to shoot two weeks from [00:58:11] goalkeeper were to shoot two weeks from now about in the in the fourth course a [00:58:15] now about in the in the fourth course a convolutional neural network you're [00:58:17] convolutional neural network you're going to work on card detection so this [00:58:18] going to work on card detection so this is a bigger image this is exactly the [00:58:21] is a bigger image this is exactly the programming assignment so you're going [00:58:23] programming assignment so you're going to work on the autonomous driving [00:58:25] to work on the autonomous driving application that is finding cars finding [00:58:28] application that is finding cars finding stop signs [00:58:29] stop signs finding lights finding pedestrians and [00:58:32] finding lights finding pedestrians and all the objects that are related to Road [00:58:34] all the objects that are related to Road features okay this is pretty cool and [00:58:36] features okay this is pretty cool and you will generate these images yourself [00:58:38] you will generate these images yourself so this is a picture taken from a camera [00:58:41] so this is a picture taken from a camera put in the front of a car and was was [00:58:46] put in the front of a car and was was generated by dr dot ai you will have a [00:58:51] generated by dr dot ai you will have a face recognition system that is going to [00:58:53] face recognition system that is going to first do face verification is this [00:58:58] first do face verification is this person is this person the right person [00:59:01] person is this person the right person but also face recognition who is this [00:59:03] but also face recognition who is this person which is a little more complex [00:59:04] person which is a little more complex we're going to go over that together [00:59:06] we're going to go over that together both online and in lecture our [00:59:09] both online and in lecture our generation some of you have heard of [00:59:11] generation some of you have heard of this it's an algorithm called neural [00:59:13] this it's an algorithm called neural side transfer and again we usually put [00:59:14] side transfer and again we usually put the papers at the bottom of the slides [00:59:17] the papers at the bottom of the slides in case you want to check in yourself [00:59:19] in case you want to check in yourself for your project but this is a problem [00:59:22] for your project but this is a problem where you give a content image which is [00:59:24] where you give a content image which is the Golden Gate Bridge and a style image [00:59:26] the Golden Gate Bridge and a style image which is an image that was painted [00:59:29] which is an image that was painted usually by someone or an image from [00:59:31] usually by someone or an image from which you want to extract the style this [00:59:33] which you want to extract the style this algorithm is going to generate a new [00:59:35] algorithm is going to generate a new image is going to mix the contents of [00:59:38] image is going to mix the contents of the first image with the style of the [00:59:40] the first image with the style of the second image music generation which is [00:59:44] second image music generation which is super fun [00:59:45] super fun you're going to generate jazz music in [00:59:48] you're going to generate jazz music in the fifth course sequence models you're [00:59:51] the fifth course sequence models you're going in the same course also generate [00:59:54] going in the same course also generate text by giving a huge corpus written by [00:59:57] text by giving a huge corpus written by Shakespeare a long time ago of poems [01:00:00] Shakespeare a long time ago of poems you're going to teach the algorithm to [01:00:01] you're going to teach the algorithm to to generate poems as if it was written [01:00:05] to generate poems as if it was written by Shakespeare so you can even write the [01:00:07] by Shakespeare so you can even write the first sentence is going to continue and [01:00:11] first sentence is going to continue and modify you all have smartphones and I [01:00:13] modify you all have smartphones and I guess you notice that when you write the [01:00:15] guess you notice that when you write the sentence on your smartphone it usually [01:00:18] sentence on your smartphone it usually tells you what you should put next and [01:00:19] tells you what you should put next and sometimes it's an emoji you're going to [01:00:21] sometimes it's an emoji you're going to do this part you're going to implement [01:00:22] do this part you're going to implement the algorithm that takes an input [01:00:24] the algorithm that takes an input sentence and tells you what's the emoji [01:00:26] sentence and tells you what's the emoji that that should come after it machine [01:00:30] that that should come after it machine translation is a is one of the [01:00:33] translation is a is one of the application that has been tremendously [01:00:36] application that has been tremendously performing well with deep learning [01:00:38] performing well with deep learning you're going to implement not a full [01:00:41] you're going to implement not a full machine translation from one language to [01:00:43] machine translation from one language to another but a similar task that is as [01:00:46] another but a similar task that is as exciting which is changing human [01:00:49] exciting which is changing human readable dates to machine readable days [01:00:51] readable dates to machine readable days so you know let's say you're you're [01:00:54] so you know let's say you're you're you're filling in a form and you're [01:00:55] you're filling in a form and you're typing a date the the entity that that [01:00:59] typing a date the the entity that that gathers this data will have a hard time [01:01:01] gathers this data will have a hard time convert all these dates into a specific [01:01:03] convert all these dates into a specific format you're going to implement the [01:01:04] format you're going to implement the algorithm that is going to take all [01:01:06] algorithm that is going to take all these different dates in different [01:01:07] these different dates in different formats and generate the right format [01:01:09] formats and generate the right format translate it to human from human [01:01:11] translate it to human from human readable to machine readable days and [01:01:14] readable to machine readable days and finally trigger word detection that I [01:01:16] finally trigger word detection that I also love and and some of you have have [01:01:19] also love and and some of you have have seen us buildeth algorithm a year ago I [01:01:22] seen us buildeth algorithm a year ago I believe which was which unison and [01:01:24] believe which was which unison and Andrew and I have have worked on trigger [01:01:28] Andrew and I have have worked on trigger word detection is the problem of [01:01:29] word detection is the problem of detecting a single word so you know you [01:01:32] detecting a single word so you know you you probably have objects from big [01:01:35] you probably have objects from big companies that detect the voice and [01:01:38] companies that detect the voice and activate themselves under a trigger word [01:01:40] activate themselves under a trigger word you're going to build this algorithm for [01:01:42] you're going to build this algorithm for the trigger word activate yeah and many [01:01:47] the trigger word activate yeah and many more projects that you will see now [01:01:49] more projects that you will see now these are the things that you will all [01:01:51] these are the things that you will all build in this course every one of you [01:01:54] build in this course every one of you will be lit through programming [01:01:55] will be lit through programming assignments but you also have to choose [01:01:57] assignments but you also have to choose your own projects to work [01:01:59] your own projects to work throughout the course and these are [01:02:01] throughout the course and these are example of projects that CS 230 students [01:02:03] example of projects that CS 230 students have have built in the past and which [01:02:06] have have built in the past and which have worked very well [01:02:07] have worked very well one is coloring black-and-white pictures [01:02:10] one is coloring black-and-white pictures using a neural network into the color [01:02:13] using a neural network into the color representation of these features so it's [01:02:14] representation of these features so it's pretty cool because we can now watch [01:02:16] pretty cool because we can now watch movies that were that were filmed in the [01:02:20] movies that were that were filmed in the 1930s or 1950s or I don't know when in [01:02:23] 1930s or 1950s or I don't know when in color which is super cool predicting a [01:02:27] color which is super cool predicting a price of an object from a picture so [01:02:29] price of an object from a picture so this was a great project in the first [01:02:30] this was a great project in the first iteration of 6 to 30 where you give it a [01:02:33] iteration of 6 to 30 where you give it a bike and then around that 4 it guesses [01:02:35] bike and then around that 4 it guesses how much is the bike so if you want to [01:02:39] how much is the bike so if you want to sell stuff you don't know how much you [01:02:40] sell stuff you don't know how much you just give it then you sell it at under [01:02:43] just give it then you sell it at under the price the student had actually [01:02:46] the price the student had actually implemented an algorithm to see which [01:02:48] implemented an algorithm to see which features of the bike are related to the [01:02:51] features of the bike are related to the price so it was super fun to see if it's [01:02:54] price so it was super fun to see if it's the steering wheel or if it's the wheels [01:02:56] the steering wheel or if it's the wheels or if it's the body of the bike that [01:02:57] or if it's the body of the bike that makes this bike expensive according to [01:02:59] makes this bike expensive according to the algorithm and many more so last [01:03:03] the algorithm and many more so last quarter specifically we've had a lot of [01:03:05] quarter specifically we've had a lot of projects in physics and an astrophysics [01:03:09] projects in physics and an astrophysics and chemical engineering and mechanics [01:03:11] and chemical engineering and mechanics which was great some examples are [01:03:13] which was great some examples are detecting earthquake precursor signals [01:03:15] detecting earthquake precursor signals with a sequence model predicting the [01:03:21] with a sequence model predicting the atom energy based on the atomic [01:03:22] atom energy based on the atomic structure of an atom so you have you [01:03:25] structure of an atom so you have you have for instance software's that run [01:03:27] have for instance software's that run that are really computationally [01:03:29] that are really computationally expensive that look at the atomic [01:03:31] expensive that look at the atomic structure of an atom and will output the [01:03:33] structure of an atom and will output the energy of this atom this takes a long [01:03:36] energy of this atom this takes a long time these students have tried to make [01:03:38] time these students have tried to make it a three second problem by running a [01:03:40] it a three second problem by running a neural network to find the energy of [01:03:42] neural network to find the energy of yeah - so you have a bunch of problem [01:03:44] yeah - so you have a bunch of problem across industries so healthcare cancer [01:03:47] across industries so healthcare cancer parking sonar dimer detection we've had [01:03:49] parking sonar dimer detection we've had a lot of these we've had brain tumor [01:03:50] a lot of these we've had brain tumor segmentation segmentation is the problem [01:03:53] segmentation segmentation is the problem of on an image classify every pixel tell [01:03:56] of on an image classify every pixel tell me which pixel correspond to the tumor [01:03:58] me which pixel correspond to the tumor for example so we were really excited to [01:04:02] for example so we were really excited to see what you guys are going to build at [01:04:04] see what you guys are going to build at the end of this quarter and that's why [01:04:06] the end of this quarter and that's why we want you to build your team very [01:04:08] we want you to build your team very quickly get started because the project [01:04:10] quickly get started because the project is what you [01:04:11] is what you be proud of at the end of the quarter we [01:04:14] be proud of at the end of the quarter we hope that you guys will come at the [01:04:15] hope that you guys will come at the poster session proud of your poster [01:04:17] poster session proud of your poster proud of the final project that you sent [01:04:19] proud of the final project that you sent us and you can talk about it in the 10 [01:04:21] us and you can talk about it in the 10 next years or 20 next year hopefully and [01:04:25] next years or 20 next year hopefully and I guess Andrew can-can can confirm that [01:04:27] I guess Andrew can-can can confirm that cs2 to 9 students from the few past [01:04:30] cs2 to 9 students from the few past years have done projects that are [01:04:32] years have done projects that are amazing today and have been featured [01:04:35] amazing today and have been featured around the world in as a researcher or [01:04:38] around the world in as a researcher or industrial project so to sum up in this [01:04:41] industrial project so to sum up in this course you will build a wide range of [01:04:43] course you will build a wide range of applications [01:04:45] applications it's very applied there is some math but [01:04:48] it's very applied there is some math but less than she has two to nine more than [01:04:50] less than she has two to nine more than cs2 to 9a and you have access to [01:04:53] cs2 to 9a and you have access to personalized mentorship thanks to the [01:04:56] personalized mentorship thanks to the amazing ta team and the instructors and [01:05:01] amazing ta team and the instructors and finally we will have to build a 10 week [01:05:05] finally we will have to build a 10 week long project so now we we get to the [01:05:09] long project so now we we get to the serious thing what is what we are up to [01:05:12] serious thing what is what we are up to this week so at the end of every lecture [01:05:14] this week so at the end of every lecture you'll have one slide that's gonna [01:05:15] you'll have one slide that's gonna remind you what you have to do for next [01:05:17] remind you what you have to do for next week next Wednesday 11:00 a.m. so create [01:05:21] week next Wednesday 11:00 a.m. so create your courser account based on the invite [01:05:23] your courser account based on the invite that you receive if you didn't receive [01:05:24] that you receive if you didn't receive an invite [01:05:26] an invite send it as a private post on Piazza we [01:05:28] send it as a private post on Piazza we will send it again finish the two first [01:05:30] will send it again finish the two first modules of course 1 C 1 M 1 and C 1 M 2 [01:05:34] modules of course 1 C 1 M 1 and C 1 M 2 it corresponds to two quizzes and two [01:05:36] it corresponds to two quizzes and two programming assignments and around 20 [01:05:38] programming assignments and around 20 videos ok which are listed here and for [01:05:41] videos ok which are listed here and for Friday it means two days from now by the [01:05:44] Friday it means two days from now by the end of the day fine project team mates [01:05:47] end of the day fine project team mates and fill in the form to tell us who are [01:05:51] and fill in the form to tell us who are your teammates it's going to help us [01:05:52] your teammates it's going to help us find your mentor finally there is a TA [01:05:57] find your mentor finally there is a TA section also this Friday no project [01:05:59] section also this Friday no project mentorship it would start next week but [01:06:01] mentorship it would start next week but we we will see you on Friday I'm gonna [01:06:04] we we will see you on Friday I'm gonna take a few questions to shout about yes [01:06:09] take a few questions to shout about yes yeah these times we're going to put be [01:06:11] yeah these times we're going to put be posted at the end of this time [01:06:15] posted at the end of this time [Music] [01:06:16] [Music] so the tail sections we're going to have [01:06:18] so the tail sections we're going to have a large range of TA section on Friday so [01:06:20] a large range of TA section on Friday so there's going to be basically every time [01:06:22] there's going to be basically every time you're going to be assigned to one of [01:06:23] you're going to be assigned to one of them and if you want to move you can [01:06:26] them and if you want to move you can send an email as a plat suppose [01:06:29] send an email as a plat suppose privately to us to be moved to another [01:06:30] privately to us to be moved to another section how big is the same usually it's [01:06:37] section how big is the same usually it's from one to three students exceptionally [01:06:40] from one to three students exceptionally we would accept a four students if the [01:06:42] we would accept a four students if the project is challenging enough yeah yes [01:06:49] so it is possible to combine the project [01:06:52] so it is possible to combine the project with other classes amines been done in [01:06:54] with other classes amines been done in the past what we want is you to to give [01:06:59] the past what we want is you to to give a project and a poster that that is [01:07:02] a project and a poster that that is framed as cs2 30 wants it to different [01:07:04] framed as cs2 30 wants it to different fabula and you discuss with us in order [01:07:06] fabula and you discuss with us in order for us to validate if you can merge this [01:07:08] for us to validate if you can merge this project with another class because it [01:07:10] project with another class because it requires to have deep learning of course [01:07:12] requires to have deep learning of course you you're not supposed to combine this [01:07:14] you you're not supposed to combine this project with something that doesn't have [01:07:15] project with something that doesn't have the kerning it off [01:07:17] the kerning it off okay all right one more question so you [01:07:29] okay all right one more question so you can you can retake the quizzes as much [01:07:31] can you can retake the quizzes as much as you want on Coursera we will consider [01:07:34] as you want on Coursera we will consider the last submitted quiz for this class [01:07:36] the last submitted quiz for this class okay so you can resubmit if you didn't [01:07:39] okay so you can resubmit if you didn't get full way yeah okay thanks guys and [01:07:42] get full way yeah okay thanks guys and see you on Friday ================================================================================ LECTURE 002 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 2 - Deep Learning Intuition Source: https://www.youtube.com/watch?v=AwQHqWyHRpU --- Transcript [00:00:05] hello everyone welcome to the second [00:00:09] hello everyone welcome to the second lecture for [00:00:09] lecture for yes 2:30 so as I as I said earlier you [00:00:13] yes 2:30 so as I as I said earlier you can go on Monty comm from your [00:00:16] can go on Monty comm from your smartphones or your computers and enter [00:00:18] smartphones or your computers and enter this code 84 5709 we will use this tool [00:00:22] this code 84 5709 we will use this tool for interactive questions during the [00:00:24] for interactive questions during the lecture and we will also use it to to [00:00:27] lecture and we will also use it to to track attendance I'll add it at the end [00:00:29] track attendance I'll add it at the end of the lecture but if you have time do [00:00:32] of the lecture but if you have time do it now let's start the lecture or you [00:00:36] it now let's start the lecture or you guys are doing that ok so today's [00:00:42] guys are doing that ok so today's lecture is going to be about deep [00:00:45] lecture is going to be about deep learning intuition and the goal is to [00:00:47] learning intuition and the goal is to give you a systematic way to think about [00:00:50] give you a systematic way to think about projects everything related to deep [00:00:52] projects everything related to deep learning it includes how to collect your [00:00:54] learning it includes how to collect your data how to label your data how to [00:00:57] data how to label your data how to choose an architecture but also how to [00:00:59] choose an architecture but also how to design a proper loss function to [00:01:01] design a proper loss function to optimize so all these decisions are [00:01:03] optimize so all these decisions are decisions you are going to have to do [00:01:05] decisions you are going to have to do during your projects and will try to [00:01:07] during your projects and will try to give you here an overview of this [00:01:10] give you here an overview of this systematic way of thinking for different [00:01:12] systematic way of thinking for different projects it's going to be high-level [00:01:15] projects it's going to be high-level more than other lectures but we hope it [00:01:18] more than other lectures but we hope it gives you a good start for your project [00:01:20] gives you a good start for your project we'll start with a 10 minute recap on [00:01:22] we'll start with a 10 minute recap on what you've seen in the two first in the [00:01:25] what you've seen in the two first in the first week about neural networks so as [00:01:28] first week about neural networks so as you know you can think of machine [00:01:31] you know you can think of machine learning deep learning in general as [00:01:32] learning deep learning in general as modeling a function that takes an input [00:01:34] modeling a function that takes an input that can be an image a speech a natural [00:01:38] that can be an image a speech a natural language or a CSV file give it to a box [00:01:43] language or a CSV file give it to a box and get an output that can be [00:01:45] and get an output that can be classification is it a cat's zero is [00:01:49] classification is it a cat's zero is there a cat on this image output one or [00:01:52] there a cat on this image output one or is there no cat on this image output [00:01:53] is there no cat on this image output zero and I think a good way to remember [00:01:57] zero and I think a good way to remember what is the model is to define it as [00:01:59] what is the model is to define it as architecture plus parameters [00:02:02] architecture plus parameters architecture is the design that you [00:02:05] architecture is the design that you choose so logistic regression is the [00:02:07] choose so logistic regression is the first one you've seen you will see [00:02:09] first one you've seen you will see shallow neural networks deep neural [00:02:10] shallow neural networks deep neural networks then you will see convolutional [00:02:12] networks then you will see convolutional neural networks and recurrent neural [00:02:14] neural networks and recurrent neural networks so these are all types of [00:02:16] networks so these are all types of architectures and you can choose to make [00:02:18] architectures and you can choose to make them deeper or shallower parameters are [00:02:21] them deeper or shallower parameters are the core parts there [00:02:23] the core parts there numbers that major function take these [00:02:25] numbers that major function take these cats as input and convert it to an [00:02:27] cats as input and convert it to an output so these are millions of numbers [00:02:29] output so these are millions of numbers and the goal of machine learning deep [00:02:31] and the goal of machine learning deep learning is to find all these numbers so [00:02:33] learning is to find all these numbers so we're all trying hard to find numbers [00:02:36] we're all trying hard to find numbers basically millions of numbers in [00:02:38] basically millions of numbers in matrices if you give this cat and you [00:02:42] matrices if you give this cat and you forward propagated so we propagate it [00:02:44] forward propagated so we propagate it through the model to get an output you [00:02:47] through the model to get an output you will have to compare this output to the [00:02:49] will have to compare this output to the ground truth the function used to do so [00:02:52] ground truth the function used to do so is called the loss function you've seen [00:02:54] is called the loss function you've seen an example of a loss function this week [00:02:55] an example of a loss function this week that is the logistic loss function we [00:02:58] that is the logistic loss function we will see more loss functions later on [00:03:01] will see more loss functions later on computing the gradient of this loss [00:03:03] computing the gradient of this loss function is going to tell you how much [00:03:04] function is going to tell you how much should I move my parameters in order to [00:03:06] should I move my parameters in order to update in order in order to make the [00:03:09] update in order in order to make the loss go down so in order to make this [00:03:12] loss go down so in order to make this function recognize cats better than [00:03:14] function recognize cats better than before you do that many many times until [00:03:17] before you do that many many times until you find the right parameters to plug in [00:03:20] you find the right parameters to plug in your architecture you can then give your [00:03:21] your architecture you can then give your cats and get an output what is very [00:03:26] cats and get an output what is very interesting in the printing is that many [00:03:27] interesting in the printing is that many things can change [00:03:28] things can change you can change the input we talked about [00:03:30] you can change the input we talked about natural language speech structure and [00:03:32] natural language speech structure and structured data in general you can [00:03:34] structured data in general you can change the output it can be a [00:03:36] change the output it can be a classification algorithm it can be a [00:03:38] classification algorithm it can be a multi-class algorithm I can ask you give [00:03:40] multi-class algorithm I can ask you give me the breed of the cat instead of [00:03:42] me the breed of the cat instead of asking you give me just the cat which [00:03:44] asking you give me just the cat which makes the problem more complicated it [00:03:46] makes the problem more complicated it can also be a regression problem I give [00:03:49] can also be a regression problem I give you the cat and I ask you give me the [00:03:51] you the cat and I ask you give me the age of the cat she's much more [00:03:53] age of the cat she's much more complicated again does that make sense [00:03:57] complicated again does that make sense ok another thing that can change is your [00:03:59] ok another thing that can change is your architecture we talked about it earlier [00:04:01] architecture we talked about it earlier and finally the last function I think [00:04:03] and finally the last function I think the last function is something that that [00:04:05] the last function is something that that people struggle with to understand what [00:04:07] people struggle with to understand what cost function to to choose for a [00:04:10] cost function to to choose for a specific project and we're going to put [00:04:11] specific project and we're going to put a huge emphasis on that today ok and of [00:04:16] a huge emphasis on that today ok and of course in the architecture you can [00:04:18] course in the architecture you can change the activation functions in this [00:04:20] change the activation functions in this optimization loop you can choose a [00:04:21] optimization loop you can choose a specific optimizers we're going to see [00:04:24] specific optimizers we're going to see in about three weeks [00:04:26] in about three weeks all the optimizers that can be atom [00:04:27] all the optimizers that can be atom stochastic gradient descent batch [00:04:29] stochastic gradient descent batch gradient descent rmsprop and momentum [00:04:31] gradient descent rmsprop and momentum and finally all the hyper parameters [00:04:34] and finally all the hyper parameters what is the learning rate of this loop [00:04:36] what is the learning rate of this loop what is the [00:04:36] what is the that I'm using for my optimization we're [00:04:38] that I'm using for my optimization we're going to see all that together but [00:04:39] going to see all that together but there's a bunch of things that can [00:04:40] there's a bunch of things that can change in this scheme any questions on [00:04:44] change in this scheme any questions on that in general so far so good okay so [00:04:52] that in general so far so good okay so let's take the first architecture that [00:04:54] let's take the first architecture that we've seen together logistic regression [00:04:56] we've seen together logistic regression as we know an image in computer science [00:04:58] as we know an image in computer science can be represented by 3d matrix each [00:05:02] can be represented by 3d matrix each matrix represent a certain color RGB red [00:05:05] matrix represent a certain color RGB red green blue we can take all these numbers [00:05:08] green blue we can take all these numbers from these 3d matrix and put it in a [00:05:11] from these 3d matrix and put it in a vector we flatten it in order to give it [00:05:13] vector we flatten it in order to give it to our logistic regression we for [00:05:15] to our logistic regression we for propagate it we multiply it by W which [00:05:18] propagate it we multiply it by W which is our parameter and B which is our bias [00:05:21] is our parameter and B which is our bias give it to a sigmoid function and get an [00:05:22] give it to a sigmoid function and get an output if the network is trained [00:05:24] output if the network is trained properly we should get a number that is [00:05:26] properly we should get a number that is more than 0.5 here to tell us that there [00:05:28] more than 0.5 here to tell us that there is a cut in this image so this is the [00:05:31] is a cut in this image so this is the basic scale now my question for you is [00:05:35] basic scale now my question for you is if I want to do the same thing but I [00:05:39] if I want to do the same thing but I want to have a classifier that can [00:05:41] want to have a classifier that can classify several animals so on the image [00:05:44] classify several animals so on the image there could be a giraffe there could be [00:05:45] there could be a giraffe there could be an elephant or there could be a cat how [00:05:48] an elephant or there could be a cat how would you modify this architecture yes [00:06:04] exactly so that's a good point we could [00:06:06] exactly so that's a good point we could add several units so several neurons one [00:06:09] add several units so several neurons one for each animal and we will call it [00:06:11] for each animal and we will call it multi logistic regression so it could be [00:06:14] multi logistic regression so it could be something like that so we have a fully [00:06:17] something like that so we have a fully connection here before we were all all [00:06:19] connection here before we were all all the inputs were connected to this neuron [00:06:21] the inputs were connected to this neuron and now we added two neurons and each [00:06:23] and now we added two neurons and each neuron is going to be responsible for [00:06:25] neuron is going to be responsible for one animal how do we know which neuron [00:06:27] one animal how do we know which neuron is responsible for which animal is the [00:06:33] is responsible for which animal is the network going to figure it out on its [00:06:34] network going to figure it out on its own or do we have to help it [00:06:43] exactly [00:06:44] exactly the label is important so what is going [00:06:47] the label is important so what is going to tell your model this neuron should [00:06:49] to tell your model this neuron should focus on cat dispersion through focus on [00:06:51] focus on cat dispersion through focus on elephant dissonance which works on [00:06:52] elephant dissonance which works on giraffe is the way you label your data [00:06:54] giraffe is the way you label your data so how should we label this data now if [00:06:57] so how should we label this data now if we were to do this specific task any [00:07:06] we were to do this specific task any ideas one harvester okay so one hard [00:07:12] ideas one harvester okay so one hard vector means a vector with all zeros and [00:07:14] vector means a vector with all zeros and one one [00:07:15] one one any other ideas one two three [00:07:22] any other ideas one two three so I assume you you say that each [00:07:24] so I assume you you say that each integer would correspond to a circle [00:07:25] integer would correspond to a circle anymore okay any other ideas modifying [00:07:38] anymore okay any other ideas modifying the last function you want to put more [00:07:40] the last function you want to put more weight on one anymore so you modify the [00:07:42] weight on one anymore so you modify the last function [00:07:47] I see we don't want hard concretely so I [00:07:51] I see we don't want hard concretely so I agree with the one hot encoding I think [00:07:53] agree with the one hot encoding I think there's a downside to do one hot [00:07:54] there's a downside to do one hot encoding what is the downside of the one [00:07:56] encoding what is the downside of the one cuts on Cody [00:08:04] yes so you're saying that the daytime [00:08:06] yes so you're saying that the daytime though if we have a lot of animals the [00:08:08] though if we have a lot of animals the data the labels only contain zero and [00:08:10] data the labels only contain zero and one one so there's a huge imbalance I [00:08:12] one one so there's a huge imbalance I don't think that's an issue because [00:08:14] don't think that's an issue because these neurons are independent from each [00:08:15] these neurons are independent from each other right now so yeah it could run [00:08:19] other right now so yeah it could run into an issue if you have really a lot [00:08:21] into an issue if you have really a lot of animals that's true but there is [00:08:23] of animals that's true but there is another problem with it the problem is [00:08:25] another problem with it the problem is that do you think if you want if you [00:08:28] that do you think if you want if you want hot and code your labels you would [00:08:31] want hot and code your labels you would be able to detect an image with a [00:08:33] be able to detect an image with a giraffe and an elephant on the image you [00:08:35] giraffe and an elephant on the image you will not be able to do so you need the [00:08:38] will not be able to do so you need the multi hots encoding so in this case if [00:08:41] multi hots encoding so in this case if there is a cat on image I will use a one [00:08:43] there is a cat on image I will use a one hot I would say 0 1 0 as my label but if [00:08:47] hot I would say 0 1 0 as my label but if I have a dog and a cat on the image I [00:08:49] I have a dog and a cat on the image I would say 1 1 0 okay the one hot [00:08:53] would say 1 1 0 okay the one hot encoding works very well when you have [00:08:54] encoding works very well when you have the constraint of having only one animal [00:08:57] the constraint of having only one animal per image and in this case you would not [00:08:59] per image and in this case you would not use an activation function called [00:09:01] use an activation function called sigmoid you would use another one which [00:09:03] sigmoid you would use another one which is softmax yeah the softmax function [00:09:08] is softmax yeah the softmax function we're going to see together and for [00:09:10] we're going to see together and for those of you - 2 to 9 you probably heard [00:09:12] those of you - 2 to 9 you probably heard of it ok so what I wanted to explain [00:09:15] of it ok so what I wanted to explain here is the way you choose your labeling [00:09:17] here is the way you choose your labeling is very important and it's a decision [00:09:18] is very important and it's a decision you should make prior to start the [00:09:21] you should make prior to start the project ok [00:09:23] project ok in terms of notation in the in this [00:09:25] in terms of notation in the in this class we're going to use the following a [00:09:27] class we're going to use the following a square bracket 1 we denote all the [00:09:30] square bracket 1 we denote all the activations of the first layer so the [00:09:32] activations of the first layer so the square bracket we denote the layer and [00:09:34] square bracket we denote the layer and the lower script we denote the index of [00:09:38] the lower script we denote the index of the neuron in the layer ok and of course [00:09:40] the neuron in the layer ok and of course you can stack this neuron on top of each [00:09:42] you can stack this neuron on top of each other to make the network more complex [00:09:45] other to make the network more complex and depending on the task you're solving [00:09:47] and depending on the task you're solving ok [00:09:49] ok now the concept I wanted to introduce in [00:09:53] now the concept I wanted to introduce in this recap was the concept of encoding [00:09:55] this recap was the concept of encoding you probably some of you have probably [00:09:58] you probably some of you have probably seen this image before if you have a [00:10:01] seen this image before if you have a network that is not too shallow you [00:10:05] network that is not too shallow you would notice that what the first neurons [00:10:07] would notice that what the first neurons see were very precise representation of [00:10:12] see were very precise representation of the data [00:10:12] the data so there are pixel level representations [00:10:14] so there are pixel level representations of the data [00:10:14] of the data x3i is probably one of the three [00:10:18] x3i is probably one of the three channels of the 3d matrix just one [00:10:20] channels of the 3d matrix just one number so what this neuron sees is going [00:10:23] number so what this neuron sees is going to be a pixel level representation of [00:10:25] to be a pixel level representation of the image okay what this neuron see is [00:10:29] the image okay what this neuron see is the second layer the one in the hidden [00:10:31] the second layer the one in the hidden layer is going to see the representation [00:10:32] layer is going to see the representation outputted by all the neurons in the [00:10:35] outputted by all the neurons in the first layer these are going to be more [00:10:37] first layer these are going to be more high-level more complex because the [00:10:39] high-level more complex because the first neurons will see pixels they're [00:10:40] first neurons will see pixels they're gonna output a little more detailed [00:10:42] gonna output a little more detailed information like I found an edge here I [00:10:44] information like I found an edge here I found an edge there and so on give it to [00:10:46] found an edge there and so on give it to the second layer the second layer is [00:10:48] the second layer the second layer is going to see more complex information [00:10:49] going to see more complex information it's going to give it to the third layer [00:10:51] it's going to give it to the third layer which is going to assemble some [00:10:53] which is going to assemble some high-level complex features that could [00:10:56] high-level complex features that could be eyes nose mouth depending on what [00:10:59] be eyes nose mouth depending on what network you've been training so this is [00:11:01] network you've been training so this is an extraction of what's happening in [00:11:04] an extraction of what's happening in each layer when the network was trained [00:11:07] each layer when the network was trained on face recognition yes yes oh I see [00:11:18] on face recognition yes yes oh I see like give you a fully connected network [00:11:20] like give you a fully connected network but that's true [00:11:20] but that's true this type of visuals are more observed [00:11:24] this type of visuals are more observed in convolutional neural networks because [00:11:26] in convolutional neural networks because these are filters but this happens also [00:11:28] these are filters but this happens also in this type of network is just harder [00:11:30] in this type of network is just harder to visualize okay so this is what we [00:11:34] to visualize okay so this is what we call an encoding it means if I extract [00:11:38] call an encoding it means if I extract the information from this layer so all [00:11:41] the information from this layer so all the numbers that are coming out of these [00:11:43] the numbers that are coming out of these edges I extract them I will have a [00:11:45] edges I extract them I will have a complex representation of my input data [00:11:48] complex representation of my input data if I extract the numbers that are at the [00:11:51] if I extract the numbers that are at the end of the first layer I will have a [00:11:52] end of the first layer I will have a lower level representation of my data [00:11:54] lower level representation of my data that might be edges okay we're going to [00:11:57] that might be edges okay we're going to use these encoding throughout this [00:11:59] use these encoding throughout this lecture any questions on that [00:12:05] okay so let's build intuition on [00:12:08] okay so let's build intuition on concrete applications we're going to [00:12:10] concrete applications we're going to start with a short warm-up with the [00:12:12] start with a short warm-up with the day-night classification and then [00:12:14] day-night classification and then quickly move to face verification and [00:12:15] quickly move to face verification and face recognition and after that we'll do [00:12:18] face recognition and after that we'll do some art generation and finish with a [00:12:20] some art generation and finish with a trigger word detection if we have time [00:12:22] trigger word detection if we have time we will talk about how to ship a model [00:12:24] we will talk about how to ship a model which is shipping architecture plus [00:12:26] which is shipping architecture plus parameters [00:12:27] parameters okay we're done fascist as I said on the [00:12:31] okay we're done fascist as I said on the architecture that lost the training [00:12:32] architecture that lost the training strategy to help you make decisions [00:12:34] strategy to help you make decisions during your project so let's start with [00:12:36] during your project so let's start with the first game we're given an image and [00:12:39] the first game we're given an image and we have to build a network that tells us [00:12:42] we have to build a network that tells us if the image is taken during the day [00:12:45] if the image is taken during the day label zero or was taken at night label [00:12:49] label zero or was taken at night label one so first question is what data set [00:12:55] one so first question is what data set do we need to collect okay labeled image [00:13:07] do we need to collect okay labeled image is captured during the day and during [00:13:09] is captured during the day and during the night I agree [00:13:11] the night I agree so probably oh yeah let me ask the [00:13:14] so probably oh yeah let me ask the question how many images that was wrong [00:13:17] question how many images that was wrong acting how many images like how do you [00:13:22] acting how many images like how do you get this number [00:13:26] can someone give me an estimate of how [00:13:28] can someone give me an estimate of how many images you need in order to solve [00:13:30] many images you need in order to solve this problem and explain how you get [00:13:32] this problem and explain how you get this s true [00:13:38] so you're saying a number similar to a [00:13:41] so you're saying a number similar to a number of parameters you have in the [00:13:42] number of parameters you have in the network so I think it's better to think [00:13:44] network so I think it's better to think of it the other way around the network [00:13:46] of it the other way around the network comes after so right now you don't know [00:13:49] comes after so right now you don't know what network you will use so you cannot [00:13:50] what network you will use so you cannot decide the number of data points based [00:13:52] decide the number of data points based on your parameters later on based on how [00:13:55] on your parameters later on based on how your network is flexible you can add [00:13:57] your network is flexible you can add more data and that's probably what you [00:14:00] more data and that's probably what you meant [00:14:00] meant but at first you want to get you want to [00:14:02] but at first you want to get you want to get a number yeah more images than [00:14:08] get a number yeah more images than pixels within an image I I don't think [00:14:13] pixels within an image I I don't think that that that has anything to do with [00:14:15] that that that has anything to do with the pixels in image you can have a very [00:14:16] the pixels in image you can have a very simple task like you have only images [00:14:19] simple task like you have only images that are red and green and you want to [00:14:20] that are red and green and you want to classify red and green the image can be [00:14:23] classify red and green the image can be giant you can have a lot of pixels it's [00:14:25] giant you can have a lot of pixels it's not gonna change the number of data [00:14:26] not gonna change the number of data points in it [00:14:31] okay so you're talking about computation [00:14:34] okay so you're talking about computation resources so the more images we have [00:14:36] resources so the more images we have probably the more computation resources [00:14:38] probably the more computation resources we will need so to me yeah there's [00:14:40] we will need so to me yeah there's something like that I think in general [00:14:42] something like that I think in general you want to try to gauge the complexity [00:14:45] you want to try to gauge the complexity of the task so let's say we did a [00:14:47] of the task so let's say we did a problem that was cat recognition [00:14:49] problem that was cat recognition detective there is a cat on an image or [00:14:51] detective there is a cat on an image or not in this problem we remember that [00:14:54] not in this problem we remember that with 10,000 images we managed to train a [00:14:58] with 10,000 images we managed to train a pretty good classifier how do you [00:15:00] pretty good classifier how do you compare this problem to the cat's [00:15:02] compare this problem to the cat's problem you think it's easier or harder [00:15:07] easier yeah I agree that's probably [00:15:09] easier yeah I agree that's probably easier so in terms of complexity this [00:15:11] easier so in terms of complexity this task looks less complex than the cat [00:15:14] task looks less complex than the cat recognition task so you will probably [00:15:16] recognition task so you will probably need less data that's a rule of thumb [00:15:18] need less data that's a rule of thumb the second rule of thumb and why I get [00:15:21] the second rule of thumb and why I get to this image is what do we exactly want [00:15:23] to this image is what do we exactly want to do do we want to classify pictures [00:15:25] to do do we want to classify pictures that were taken outside which seems even [00:15:28] that were taken outside which seems even easier or do we want also the network to [00:15:31] easier or do we want also the network to classify complicated pictures what what [00:15:33] classify complicated pictures what what do I mean by complicated pictures inside [00:15:40] do I mean by complicated pictures inside your house so like let's say on a [00:15:42] your house so like let's say on a picture you have a window on the right [00:15:43] picture you have a window on the right side a human will be able to say it's [00:15:45] side a human will be able to say it's the day because I see the window but for [00:15:48] the day because I see the window but for the network is going to take a much [00:15:49] the network is going to take a much longer to learn that much longer than [00:15:51] longer to learn that much longer than for pictures taken outside what else [00:15:53] for pictures taken outside what else what are other complicated don't I like [00:15:59] what are other complicated don't I like sunrise sunset in general it's [00:16:01] sunrise sunset in general it's complicated because you have to define [00:16:03] complicated because you have to define it and you have to teach your network [00:16:05] it and you have to teach your network what what does that mean is it night or [00:16:07] what what does that mean is it night or day okay so depending on what task you [00:16:10] day okay so depending on what task you want to solve it's going to tell you if [00:16:12] want to solve it's going to tell you if you need more data or less data I think [00:16:14] you need more data or less data I think for this task if you take outside [00:16:16] for this task if you take outside pictures 10,000 images is going to be [00:16:18] pictures 10,000 images is going to be enough but if you want the network to [00:16:21] enough but if you want the network to detect indoor as well you probably need [00:16:23] detect indoor as well you probably need a hundred thousand images something and [00:16:25] a hundred thousand images something and this is based on comparing with projects [00:16:28] this is based on comparing with projects you did in the past so it's going to [00:16:29] you did in the past so it's going to come with experience now [00:16:32] come with experience now as you know when you have a dataset you [00:16:34] as you know when you have a dataset you need to split it between trained [00:16:35] need to split it between trained validation and test sets some of you [00:16:37] validation and test sets some of you have heard that we're going to sit [00:16:38] have heard that we're going to sit together even more you need to train [00:16:41] together even more you need to train your network on a specific set and test [00:16:43] your network on a specific set and test another one how do you think you should [00:16:45] another one how do you think you should split these 10,000 images 50/50 between [00:16:52] split these 10,000 images 50/50 between training tests 8020 I think we would go [00:16:56] training tests 8020 I think we would go more towards 8020 because the test set [00:17:00] more towards 8020 because the test set is made for analyzed to analyze if your [00:17:03] is made for analyzed to analyze if your network is doing well on real-world data [00:17:05] network is doing well on real-world data or not I think 2,000 images is enough to [00:17:08] or not I think 2,000 images is enough to get that sense probably and you want to [00:17:10] get that sense probably and you want to put complicated examples in this data [00:17:12] put complicated examples in this data set this way so I would go towards 8020 [00:17:14] set this way so I would go towards 8020 and the bigger the data set the more I [00:17:16] and the bigger the data set the more I would put in the train set so if I have [00:17:18] would put in the train set so if I have 1 million images I would put even more [00:17:21] 1 million images I would put even more like 98% maybe in the train set and 2% [00:17:24] like 98% maybe in the train set and 2% to test my model okay now I wrote bias [00:17:28] to test my model okay now I wrote bias here what do I mean by bias yes you need [00:17:35] here what do I mean by bias yes you need to correct balance between classes you [00:17:37] to correct balance between classes you don't want to give 9000 dart images in [00:17:40] don't want to give 9000 dart images in 1,000 day images you want a balance [00:17:42] 1,000 day images you want a balance between these two to teach your network [00:17:43] between these two to teach your network to recognize both classes okay what [00:17:48] to recognize both classes okay what should be the input of your network [00:17:50] should be the input of your network [Music] [00:17:56] the pixel image yeah so this is an [00:17:59] the pixel image yeah so this is an example of a pixel image it's the Louvre [00:18:00] example of a pixel image it's the Louvre Museum during the day harder question [00:18:05] Museum during the day harder question what should be the resolution of this [00:18:07] what should be the resolution of this image and why do we care that's great [00:18:20] image and why do we care that's great so she said just move it to for SCPD [00:18:23] so she said just move it to for SCPD students as well as low as you can in [00:18:26] students as well as low as you can in order to achieve good results why do we [00:18:29] order to achieve good results why do we want low resolution is because in terms [00:18:31] want low resolution is because in terms of computation is going to be better [00:18:33] of computation is going to be better remember if I have a 32 by 32 image how [00:18:37] remember if I have a 32 by 32 image how many pixels there are if it's color I [00:18:39] many pixels there are if it's color I have 32 times 32 times 3 if I have 400 [00:18:43] have 32 times 32 times 3 if I have 400 by 400 I have 400 by 400 by 3 it's a lot [00:18:46] by 400 I have 400 by 400 by 3 it's a lot more so I want to minimize the [00:18:48] more so I want to minimize the resolution in order to still be able to [00:18:50] resolution in order to still be able to achieve good performance so what does it [00:18:53] achieve good performance so what does it mean to still achieve good performance [00:18:56] mean to still achieve good performance how do I get this number [00:19:05] okay similar resolution as you expect [00:19:08] okay similar resolution as you expect the algorithm in real life to work on [00:19:10] the algorithm in real life to work on yet probably I agree what else what [00:19:13] yet probably I agree what else what other rule of thumb can you use in order [00:19:15] other rule of thumb can you use in order to choose this resolution great idea [00:19:24] to choose this resolution great idea compared to human performance so what I [00:19:26] compared to human performance so what I do so there's one way to do it which is [00:19:28] do so there's one way to do it which is the brute force way I would say we will [00:19:30] the brute force way I would say we will train models on different resolutions [00:19:32] train models on different resolutions and then compare their results or you [00:19:34] and then compare their results or you can be smart and use human performance [00:19:36] can be smart and use human performance as a comparison so I will print this [00:19:39] as a comparison so I will print this image or several images like this in [00:19:41] image or several images like this in different resolutions on paper and I [00:19:43] different resolutions on paper and I would go see humans and say classify [00:19:45] would go see humans and say classify those classify those and classify those [00:19:47] those classify those and classify those and I would compare human performance on [00:19:49] and I would compare human performance on all these three types of resolution in [00:19:51] all these three types of resolution in order to decide what's the minimum [00:19:53] order to decide what's the minimum resolution that I can use in order to [00:19:55] resolution that I can use in order to get perfect human performance so by [00:19:58] get perfect human performance so by doing that I got that 64 by 64 by 3 was [00:20:03] doing that I got that 64 by 64 by 3 was enough resolution for a human to detect [00:20:06] enough resolution for a human to detect if an image is taken during the day or [00:20:08] if an image is taken during the day or during the night and this is a pretty [00:20:09] during the night and this is a pretty small resolution in imaging but it seems [00:20:12] small resolution in imaging but it seems like a small like an easy task if you [00:20:15] like a small like an easy task if you have to find a breed of a cat you [00:20:18] have to find a breed of a cat you probably need more because some cats are [00:20:21] probably need more because some cats are very look very alike and you need a high [00:20:23] very look very alike and you need a high resolution to distinguish them and maybe [00:20:25] resolution to distinguish them and maybe training for the human as well I know [00:20:28] training for the human as well I know only three bits of cat so I wouldn't be [00:20:29] only three bits of cat so I wouldn't be able to do it anyway what should be the [00:20:33] able to do it anyway what should be the output of the model labels so y equals [00:20:39] output of the model labels so y equals zero for day y call one for night I [00:20:41] zero for day y call one for night I agree what should be the last activation [00:20:44] agree what should be the last activation of the network the last function sigmoid [00:20:48] of the network the last function sigmoid we saw that see mo it takes a number [00:20:50] we saw that see mo it takes a number between plus infinity minus infinity and [00:20:52] between plus infinity minus infinity and plus infinity puts it between 0 and 1 so [00:20:54] plus infinity puts it between 0 and 1 so that we can interpret it as a [00:20:55] that we can interpret it as a probability what architecture would you [00:20:58] probability what architecture would you use [00:21:05] fully connected or convolutional I think [00:21:07] fully connected or convolutional I think later this quarter you will see that [00:21:09] later this quarter you will see that convolutional perform well in imaging so [00:21:11] convolutional perform well in imaging so we would directly use a convolution [00:21:12] we would directly use a convolution writing a shallow Network fully [00:21:15] writing a shallow Network fully connected or convolutional would do the [00:21:16] connected or convolutional would do the job pretty well you don't need a deep [00:21:17] job pretty well you don't need a deep network because you gauge the complexity [00:21:20] network because you gauge the complexity of this task and what should be the loss [00:21:23] of this task and what should be the loss function finally the log likelihoods [00:21:39] function finally the log likelihoods it's also called the logistic loss [00:21:40] it's also called the logistic loss that's the one you're talking about so [00:21:42] that's the one you're talking about so the way you get this number and you'll [00:21:44] the way you get this number and you'll prove it in in CS two to nine we're not [00:21:46] prove it in in CS two to nine we're not going to prove it here but basically you [00:21:49] going to prove it here but basically you interpret your data in a probabilistic [00:21:51] interpret your data in a probabilistic way and you take the maximum likelihood [00:21:54] way and you take the maximum likelihood estimation of the data which gives you [00:21:56] estimation of the data which gives you this formula for those of you who did [00:21:58] this formula for those of you who did the math you can ask in office hours [00:22:00] the math you can ask in office hours days are going to help you understand it [00:22:01] days are going to help you understand it more properly okay and of course this [00:22:05] more properly okay and of course this means that if y equals zero what y hat [00:22:07] means that if y equals zero what y hat the prediction to be close to zero if y [00:22:09] the prediction to be close to zero if y call one we want Y hat the prediction to [00:22:11] call one we want Y hat the prediction to be close to one okay so this was the [00:22:14] be close to one okay so this was the warm now we're going to delve into face [00:22:17] warm now we're going to delve into face verification any question on the inline [00:22:19] verification any question on the inline classification [00:22:20] classification yes [00:22:48] so your the question is about how you [00:22:50] so your the question is about how you choose the size of the test set versus [00:22:52] choose the size of the test set versus the train set in general you would first [00:22:55] the train set in general you would first say how many images do I need or data [00:22:58] say how many images do I need or data points in order to be able to understand [00:23:00] points in order to be able to understand what my model do in the real world this [00:23:03] what my model do in the real world this can depend on the task like if I talk [00:23:05] can depend on the task like if I talk about if I if I tell you about speech [00:23:07] about if I if I tell you about speech recognition you want to figure out if [00:23:09] recognition you want to figure out if your model is doing well for all accents [00:23:11] your model is doing well for all accents in the world [00:23:12] in the world so your test set might be very big and [00:23:14] so your test set might be very big and very distributed in this case you might [00:23:17] very distributed in this case you might have a few examples that are during the [00:23:19] have a few examples that are during the day few during the night and a few at [00:23:20] day few during the night and a few at dawn and sunset sunrise and also indoor [00:23:23] dawn and sunset sunrise and also indoor few of those is going to give you a [00:23:25] few of those is going to give you a number so there's no good number there [00:23:27] number so there's no good number there is like you have to gauge it okay one [00:23:30] is like you have to gauge it okay one more question yeah that's a good [00:23:35] more question yeah that's a good question [00:23:35] question so how do you choose the last function [00:23:37] so how do you choose the last function we're going to see in the next in the [00:23:41] we're going to see in the next in the next slides how to choose loss functions [00:23:42] next slides how to choose loss functions but for this one specifically you choose [00:23:45] but for this one specifically you choose this one because it's a it's a convex [00:23:47] this one because it's a it's a convex function for classification problem it's [00:23:50] function for classification problem it's easier to optimize than other loss [00:23:51] easier to optimize than other loss functions so there is a proof but but I [00:23:54] functions so there is a proof but but I will not go over it here if you know the [00:23:57] will not go over it here if you know the l1 loss that compares Y to Y hat this [00:24:00] l1 loss that compares Y to Y hat this one is harder to optimize for a [00:24:02] one is harder to optimize for a classification problem we would use it [00:24:04] classification problem we would use it for regression problems okay so our new [00:24:08] for regression problems okay so our new game is the school wants to use face [00:24:10] game is the school wants to use face verification to validate student IDs in [00:24:14] verification to validate student IDs in facilities like the gym so you know when [00:24:16] facilities like the gym so you know when you enter the gym you swipe your ID and [00:24:19] you enter the gym you swipe your ID and then I guess the person sees your face [00:24:22] then I guess the person sees your face on the screen based on this ID and looks [00:24:25] on the screen based on this ID and looks at your face in real and compares let's [00:24:27] at your face in real and compares let's say so now we want to put a camera and [00:24:30] say so now we want to put a camera and have you swipe and the camera is going [00:24:34] have you swipe and the camera is going to compare this image to the image in [00:24:36] to compare this image to the image in the database does that make sense to let [00:24:38] the database does that make sense to let you in or not so what what data set do [00:24:42] you in or not so what what data set do we need to solve this problem [00:24:44] we need to solve this problem what should we collect okay between the [00:24:54] what should we collect okay between the ID and the image yeah so probably [00:24:58] ID and the image yeah so probably schools have databases because when you [00:25:00] schools have databases because when you enter the school you submit your image [00:25:02] enter the school you submit your image and you're sorry given a card an ID so [00:25:05] and you're sorry given a card an ID so you have this mapping okay what else [00:25:08] you have this mapping okay what else doing it so pictures of every student [00:25:10] doing it so pictures of every student label with their names that's what you [00:25:11] label with their names that's what you say so this is a picture of birth home [00:25:13] say so this is a picture of birth home is the picture when he was younger and [00:25:16] is the picture when he was younger and that's the one he gave to the school [00:25:17] that's the one he gave to the school when he arrived what should be the input [00:25:22] when he arrived what should be the input of our model is it this picture more [00:25:31] of our model is it this picture more photos of him I'm asking just like the [00:25:33] photos of him I'm asking just like the input of the model like we probably need [00:25:36] input of the model like we probably need more photos of him as well but what's [00:25:38] more photos of him as well but what's what's going to be the image we give to [00:25:40] what's going to be the image we give to the model exactly the person standing in [00:25:45] the model exactly the person standing in front of the camera when entering the [00:25:47] front of the camera when entering the gym so this is the entrance of the gym [00:25:49] gym so this is the entrance of the gym and Bergeron's trying to enter the gym [00:25:52] and Bergeron's trying to enter the gym so it's him okay what should be the [00:25:55] so it's him okay what should be the resolution those of you who have done [00:26:00] resolution those of you who have done projects in imaging what do you think [00:26:01] projects in imaging what do you think should be the resolution [00:26:09] in 256 by 256 and your other idea for [00:26:13] in 256 by 256 and your other idea for free size I think in general you will go [00:26:18] free size I think in general you will go over 400 so 400 by 400 what's the reason [00:26:23] over 400 so 400 by 400 what's the reason why do we need 64 for 4 day night and [00:26:26] why do we need 64 for 4 day night and and 400 for face verification yeah yeah [00:26:33] and 400 for face verification yeah yeah there's more details to detect so like [00:26:35] there's more details to detect so like distance between the eyes [00:26:36] distance between the eyes probably size of the nose mouth general [00:26:41] probably size of the nose mouth general general features of the face these are [00:26:43] general features of the face these are harder to detect for a 64 by 64 image [00:26:45] harder to detect for a 64 by 64 image and you can test it you can go outside [00:26:48] and you can test it you can go outside and show two pictures of people that [00:26:51] and show two pictures of people that look like each other and ask people can [00:26:52] look like each other and ask people can you differentiate those two person or [00:26:54] you differentiate those two person or not and you'll see that with less than [00:26:56] not and you'll see that with less than that sometimes it's people are [00:26:58] that sometimes it's people are struggling is color important that's a [00:27:02] struggling is color important that's a good question we should have talked [00:27:03] good question we should have talked about it in day and night actually is [00:27:04] about it in day and night actually is color important because if you remove [00:27:06] color important because if you remove the color you basically divide by three [00:27:08] the color you basically divide by three the number of pixels right so if we [00:27:11] the number of pixels right so if we could do it without color we would do it [00:27:13] could do it without color we would do it without color in this case color is [00:27:15] without color in this case color is going to be important because probably [00:27:17] going to be important because probably you want your camera to work in [00:27:19] you want your camera to work in different settings day/night as well so [00:27:23] different settings day/night as well so the luminosity is different the [00:27:24] the luminosity is different the brightness and also we all have [00:27:27] brightness and also we all have different colors and we need to all be [00:27:28] different colors and we need to all be detected compared to each other I might [00:27:31] detected compared to each other I might go somewhere in an island and come back [00:27:33] go somewhere in an island and come back you know full of color but but I still [00:27:37] you know full of color but but I still want to be able to access the gym [00:27:40] want to be able to access the gym outputs what should be the output I [00:27:51] think if you have unlimited [00:27:54] think if you have unlimited computational power you will take more [00:27:55] computational power you will take more resolution but that's the trade-off [00:27:57] resolution but that's the trade-off between computational results so output [00:28:01] between computational results so output is going to be 1 if it's you and 0 if [00:28:04] is going to be 1 if it's you and 0 if it's not you in which case they would [00:28:06] it's not you in which case they would not let you in okay now the question is [00:28:12] not let you in okay now the question is what architecture should be used to [00:28:13] what architecture should be used to solve this problem now that we collected [00:28:15] solve this problem now that we collected the data set of mapping between student [00:28:18] the data set of mapping between student IDs and images [00:28:28] you know how do you know how many images [00:28:31] you know how do you know how many images you need to train the network you don't [00:28:34] you need to train the network you don't know you can find an estimate it's going [00:28:36] know you can find an estimate it's going to depend on your architecture but in [00:28:38] to depend on your architecture but in general the more complex the task the [00:28:40] general the more complex the task the more data you will need and we will see [00:28:42] more data you will need and we will see something called [00:28:42] something called error analysis in about 4 weeks which is [00:28:45] error analysis in about 4 weeks which is once your network works you're going to [00:28:48] once your network works you're going to give it a lot of examples detect which [00:28:50] give it a lot of examples detect which examples are misclassified by your [00:28:52] examples are misclassified by your network and you're going to add more of [00:28:54] network and you're going to add more of these in the training set so you're [00:28:56] these in the training set so you're going to boost your data set ok talking [00:28:59] going to boost your data set ok talking about the architecture if I ask you [00:29:01] about the architecture if I ask you what's the easiest way to compare two [00:29:03] what's the easiest way to compare two images what would you do like these two [00:29:06] images what would you do like these two images the database image and the input [00:29:08] images the database image and the input image some sort of hash value means I [00:29:13] image some sort of hash value means I have chickens [00:29:16] have chickens standardized functional ok [00:29:19] standardized functional ok taking him take this run it into a [00:29:22] taking him take this run it into a specific function take this run it into [00:29:24] specific function take this run it into a specific function and comparator 2 [00:29:26] a specific function and comparator 2 values that's correct that's a good idea [00:29:28] values that's correct that's a good idea and the more basic one is just computed [00:29:30] and the more basic one is just computed distance between the pixels just compute [00:29:33] distance between the pixels just compute the distance between the pixels and you [00:29:35] the distance between the pixels and you get if it's the same person or not it [00:29:36] get if it's the same person or not it doesn't work and a few reasons are the [00:29:39] doesn't work and a few reasons are the background lighting can be different and [00:29:40] background lighting can be different and so if I do this - this this pixel which [00:29:44] so if I do this - this this pixel which is let's say dark is going to have a [00:29:46] is let's say dark is going to have a value of 0 this pixel which is white is [00:29:49] value of 0 this pixel which is white is going to have a value of 255 the [00:29:51] going to have a value of 255 the distance is gigantic but it's still the [00:29:53] distance is gigantic but it's still the same person it's a problem person can [00:29:57] same person it's a problem person can wear makeup I can grow there can be [00:29:59] wear makeup I can grow there can be younger on a picture the ID can be [00:30:00] younger on a picture the ID can be outdated so it doesn't work to just [00:30:03] outdated so it doesn't work to just compare these two pictures together we [00:30:04] compare these two pictures together we need to find a function that we will [00:30:07] need to find a function that we will apply this these two images to and will [00:30:09] apply this these two images to and will give us a more a better representation [00:30:12] give us a more a better representation of the image so that's what we're going [00:30:16] of the image so that's what we're going to do now what we're going to do is that [00:30:18] to do now what we're going to do is that will encode information use the encoding [00:30:21] will encode information use the encoding that we talked about of the picture in [00:30:23] that we talked about of the picture in the vector so we want a vector that [00:30:25] the vector so we want a vector that would represent features like distance [00:30:28] would represent features like distance between eyes nose mouth color all these [00:30:32] between eyes nose mouth color all these type of stuff hair in a vector so this [00:30:36] type of stuff hair in a vector so this is the picture of weft Hong from the ID [00:30:37] is the picture of weft Hong from the ID we would run it to a network and we [00:30:39] we would run it to a network and we hopefully can find a good encoding of [00:30:42] hopefully can find a good encoding of this network then we will run the [00:30:44] this network then we will run the picture of Beth home add the facility [00:30:46] picture of Beth home add the facility run it in the deep network get another [00:30:49] run it in the deep network get another vector and hopefully if we train the [00:30:51] vector and hopefully if we train the network properly these two vector should [00:30:53] network properly these two vector should be close to each other let's say we have [00:30:56] be close to each other let's say we have a threshold that is 0.5 0.4 is the [00:31:00] a threshold that is 0.5 0.4 is the distance between these two it's less [00:31:01] distance between these two it's less than the threshold so I would say about [00:31:03] than the threshold so I would say about how is the right person it's you [00:31:06] how is the right person it's you does this scheme make chain make sense [00:31:11] what does the 128th vector below so the [00:31:15] what does the 128th vector below so the question can I say that the third entry [00:31:17] question can I say that the third entry corresponds to something specific it's [00:31:19] corresponds to something specific it's complicated to say but depending on what [00:31:21] complicated to say but depending on what network you choose and the training [00:31:24] network you choose and the training process you choose it will give you a [00:31:25] process you choose it will give you a different network a different vector so [00:31:28] different network a different vector so that's what we're going to talk about [00:31:29] that's what we're going to talk about now the question is how do I know that [00:31:31] now the question is how do I know that this vector is good like right now if I [00:31:34] this vector is good like right now if I take a random network I give my image to [00:31:36] take a random network I give my image to it it's gonna output around a vector [00:31:38] it it's gonna output around a vector this vector is not going to contain any [00:31:39] this vector is not going to contain any useful information I want to make sure [00:31:41] useful information I want to make sure that this information is useful and [00:31:43] that this information is useful and that's how I will design my loss [00:31:46] that's how I will design my loss function ok so just to recap with the [00:31:50] function ok so just to recap with the other all student faces encoding in a [00:31:52] other all student faces encoding in a database once we have this and given a [00:31:55] database once we have this and given a new picture we compute the distance [00:31:57] new picture we compute the distance between between the new picture and all [00:31:59] between between the new picture and all the vectors in the database if we find a [00:32:01] the vectors in the database if we find a match oh sorry we compare this vector of [00:32:04] match oh sorry we compare this vector of the input image with the vector [00:32:07] the input image with the vector corresponding to the ID image if it's [00:32:10] corresponding to the ID image if it's small we consider that is the same [00:32:11] small we consider that is the same person ok now talking about the loss and [00:32:14] person ok now talking about the loss and the training to figure out is this [00:32:16] the training to figure out is this vector corresponds to something [00:32:18] vector corresponds to something meaningful first we need more data [00:32:23] meaningful first we need more data because we need our model to understand [00:32:26] because we need our model to understand in general the features of the face and [00:32:27] in general the features of the face and a university that has a thousand [00:32:30] a university that has a thousand students is probably not going to be [00:32:32] students is probably not going to be enough to have a thousand image in order [00:32:34] enough to have a thousand image in order to push a model to understand all the [00:32:36] to push a model to understand all the features of the face instead we will go [00:32:38] features of the face instead we will go online find open datasets with millions [00:32:41] online find open datasets with millions of pictures of faces and help the model [00:32:44] of pictures of faces and help the model learned from these faces to then use it [00:32:46] learned from these faces to then use it inside the facility [00:32:47] inside the facility was a question in the back like we did [00:32:50] was a question in the back like we did with Andrea but every student is a one [00:32:54] with Andrea but every student is a one that's another option so the question is [00:32:57] that's another option so the question is why can't you use the one hot encoding [00:33:00] why can't you use the one hot encoding we could be the classifier that has n [00:33:04] we could be the classifier that has n output neurons and corresponding to the [00:33:06] output neurons and corresponding to the number of students in the school and you [00:33:08] number of students in the school and you take an image you run it to the network [00:33:11] take an image you run it to the network is going to tell you which student it is [00:33:12] is going to tell you which student it is what's the issue with that every year [00:33:15] what's the issue with that every year students enter the school you will have [00:33:17] students enter the school you will have to modify your network every year [00:33:19] to modify your network every year because you have more students and you [00:33:22] because you have more students and you need a higher output vector a larger [00:33:25] need a higher output vector a larger output vector we don't want to retrain [00:33:27] output vector we don't want to retrain all the time our networks [00:33:28] all the time our networks okay so what's what what we really want [00:33:31] okay so what's what what we really want if we want to put it in words is that oh [00:33:35] if we want to put it in words is that oh there's a mistake here what we really [00:33:37] there's a mistake here what we really want is if I give you two pictures of [00:33:40] want is if I give you two pictures of the same person I want a similar [00:33:43] the same person I want a similar encoding I want the vector to be similar [00:33:45] encoding I want the vector to be similar if I give you two pictures of different [00:33:47] if I give you two pictures of different persons I want different encodings I [00:33:50] persons I want different encodings I want the vector to be very different and [00:33:52] want the vector to be very different and we're going to rely on these two [00:33:55] we're going to rely on these two assumptions and these two dots in order [00:33:57] assumptions and these two dots in order to generate our last function by giving [00:34:01] to generate our last function by giving it triplets triplets means three [00:34:03] it triplets triplets means three pictures one that we call anchor that is [00:34:06] pictures one that we call anchor that is the person a person one that we call [00:34:08] the person a person one that we call positive that is the same person as the [00:34:10] positive that is the same person as the anchor but a different picture of that [00:34:11] anchor but a different picture of that person and the third one that we call [00:34:14] person and the third one that we call negative that is a picture of someone [00:34:16] negative that is a picture of someone else and now what we want to do is to [00:34:19] else and now what we want to do is to minimize the encoding distance between [00:34:20] minimize the encoding distance between the anchor and the positive and maximize [00:34:23] the anchor and the positive and maximize the encoding distance between the anchor [00:34:24] the encoding distance between the anchor of the neck and the negative thus these [00:34:27] of the neck and the negative thus these two thoughts make sense so now my [00:34:30] two thoughts make sense so now my question for you is what should be the [00:34:33] question for you is what should be the loss function what should be the loss [00:34:38] loss function what should be the loss function so please go on menti and enter [00:34:41] function so please go on menti and enter the code and there are three options [00:34:43] the code and there are three options here a B and C choose which of these you [00:34:47] here a B and C choose which of these you think should be the right loss function [00:34:48] think should be the right loss function to use for this problem [00:34:53] now you have it on your phone as well [00:34:56] now you have it on your phone as well like issue it small on the screen but [00:34:59] like issue it small on the screen but you can see it on on it's cut off it's [00:35:10] you can see it on on it's cut off it's better here [00:35:13] [Music] [00:35:23] eight four five seven zero nine can you [00:35:33] eight four five seven zero nine can you see it on your phone [00:36:04] so by end of a I mean the encoding [00:36:08] so by end of a I mean the encoding vector of the anchor my anchor fee I [00:36:11] vector of the anchor my anchor fee I mean the including vector of the [00:36:13] mean the including vector of the positive image after you run them to the [00:36:15] positive image after you run them to the network [00:36:40] okay 30 more seconds okay all right 20 [00:36:56] okay 30 more seconds okay all right 20 more seconds okay let's see what we have [00:37:11] okay so two thirds of the people think [00:37:17] okay so two thirds of the people think that that it's the first answer a so I [00:37:21] that that it's the first answer a so I read it for everyone [00:37:23] read it for everyone the last is equal to the l2 distance [00:37:26] the last is equal to the l2 distance between the encoding of a and the [00:37:28] between the encoding of a and the encoding of P minus the l2 distance [00:37:30] encoding of P minus the l2 distance between the encoding of a and the [00:37:32] between the encoding of a and the encoding of n so someone who has [00:37:35] encoding of n so someone who has answered this do you want to give a an [00:37:37] answered this do you want to give a an explanation [00:37:39] explanation yes we're trying to minimize the first [00:37:43] yes we're trying to minimize the first difference between a and the positive [00:37:46] difference between a and the positive and you tend to maximize difference [00:37:48] and you tend to maximize difference between a and the negative when you [00:37:50] between a and the negative when you subtract so the second part can be [00:37:53] subtract so the second part can be responsible to love minimize minimize [00:37:57] responsible to love minimize minimize yes that's correct so what you said I [00:38:00] yes that's correct so what you said I repeat it for this video students we [00:38:02] repeat it for this video students we want to maximize the distance between [00:38:05] want to maximize the distance between the encoding of a and the encoding of [00:38:07] the encoding of a and the encoding of the negative that's why we have a minus [00:38:09] the negative that's why we have a minus sign here because we want the loss to go [00:38:11] sign here because we want the loss to go down and to go down we put a minus sign [00:38:13] down and to go down we put a minus sign and we maximize this term and on the [00:38:16] and we maximize this term and on the other hand we want to minimize the other [00:38:17] other hand we want to minimize the other term because it's a positive term okay [00:38:20] term because it's a positive term okay so are you agree we don't sir okay that [00:38:24] so are you agree we don't sir okay that was the first time you use this tool [00:38:25] was the first time you use this tool it's gonna be quicker next time okay so [00:38:28] it's gonna be quicker next time okay so we have we have figured out what the [00:38:31] we have we have figured out what the last function should be and now think [00:38:33] last function should be and now think about it [00:38:33] about it now that we designed our last function [00:38:35] now that we designed our last function we're able to use an optimization [00:38:38] we're able to use an optimization algorithm run an image in the network [00:38:41] algorithm run an image in the network sorry run run [00:38:43] sorry run run three images in the network like that [00:38:46] three images in the network like that gets three outputs encoding of a [00:38:49] gets three outputs encoding of a encoding of T encoding of n compute the [00:38:52] encoding of T encoding of n compute the loss take the gradients of the loss and [00:38:54] loss take the gradients of the loss and update the parameters in order to [00:38:56] update the parameters in order to minimize the loss hopefully after doing [00:38:59] minimize the loss hopefully after doing that many times we would get an encoding [00:39:02] that many times we would get an encoding that represents features of the face [00:39:04] that represents features of the face because the network will have to figure [00:39:07] because the network will have to figure out who are the same people who are [00:39:09] out who are the same people who are different people does that make sense [00:39:11] different people does that make sense this is called the triplet loss and I [00:39:14] this is called the triplet loss and I cheated a little bit in the in the quiz [00:39:16] cheated a little bit in the in the quiz I didn't write this alpha the true loss [00:39:19] I didn't write this alpha the true loss function contains a small alpha you know [00:39:21] function contains a small alpha you know why yes so you don't have negative loss [00:39:31] why yes so you don't have negative loss yeah that that's not exactly the role of [00:39:34] yeah that that's not exactly the role of the Alpha in order to not have negative [00:39:36] the Alpha in order to not have negative loss what you can do is to use a maximum [00:39:38] loss what you can do is to use a maximum of the loss and zero and train on the [00:39:40] of the loss and zero and train on the maximum of the loss and zero but there [00:39:43] maximum of the loss and zero but there is another reason why we have this alpha [00:39:45] is another reason why we have this alpha yes which one you prefer based on false [00:39:55] yes which one you prefer based on false negative unfortunate if no it's not [00:39:57] negative unfortunate if no it's not about that so sometimes you have an [00:39:59] about that so sometimes you have an alpha in loss function to put a weight [00:40:01] alpha in loss function to put a weight on some classes but this is an [00:40:03] on some classes but this is an additional alpha it's not a [00:40:04] additional alpha it's not a multiplicative alpha so it has nothing [00:40:06] multiplicative alpha so it has nothing to do with that yeah to penalize large [00:40:11] to do with that yeah to penalize large weights are you talking about [00:40:12] weights are you talking about generalization if we had weights in this [00:40:16] generalization if we had weights in this formula next to the Alpha like alpha [00:40:18] formula next to the Alpha like alpha times the norm of the weights this would [00:40:20] times the norm of the weights this would be regularization but here this term [00:40:22] be regularization but here this term doesn't penalize weight it's not going [00:40:28] doesn't penalize weight it's not going to affect the gradient it's not going to [00:40:30] to affect the gradient it's not going to affect it's not gonna affect the weights [00:40:31] affect it's not gonna affect the weights but the reason we have it here is [00:40:34] but the reason we have it here is because let's say the encoding function [00:40:36] because let's say the encoding function is let's say the encoding function is [00:40:39] is let's say the encoding function is just a function 0 what we're going to [00:40:43] just a function 0 what we're going to have is that we're going to have [00:40:44] have is that we're going to have encoding of a equals 0 minus 0 and here [00:40:47] encoding of a equals 0 minus 0 and here zero minus zero and so we will have [00:40:50] zero minus zero and so we will have basically a perfect loss of zero and we [00:40:56] basically a perfect loss of zero and we still didn't train our network we [00:40:58] still didn't train our network we just learned affection No so this alpha [00:41:00] just learned affection No so this alpha is called the margin and it pushes your [00:41:02] is called the margin and it pushes your network to learn something meaningful in [00:41:04] network to learn something meaningful in order to to to stabilize itself on on [00:41:07] order to to to stabilize itself on on zeros okay yeah so it also has to do [00:41:18] zeros okay yeah so it also has to do with the initializations but because we [00:41:20] with the initializations but because we didn't talk about initialization yet we [00:41:22] didn't talk about initialization yet we only saw zero initialization I think and [00:41:24] only saw zero initialization I think and constellation two together another way [00:41:27] constellation two together another way to to avoid the network to stabilize to [00:41:31] to to avoid the network to stabilize to become stable on zero is to change the [00:41:33] become stable on zero is to change the initialization scheme and in two weeks [00:41:35] initialization scheme and in two weeks we're going to see different [00:41:36] we're going to see different initialization schemes together so the [00:41:51] initialization schemes together so the question is how do we know that this [00:41:53] question is how do we know that this network is going to be robust to [00:41:54] network is going to be robust to rotations of the image or scaling of the [00:41:56] rotations of the image or scaling of the image or translation of the image we [00:41:59] image or translation of the image we know it's because in the data set we're [00:42:00] know it's because in the data set we're going to give let's say your picture and [00:42:03] going to give let's say your picture and your picture scale and we're going to [00:42:05] your picture scale and we're going to tell the network this is the same person [00:42:06] tell the network this is the same person so the network will have to learn that [00:42:09] so the network will have to learn that the scale doesn't mean it's not the same [00:42:11] the scale doesn't mean it's not the same person you have to learn this feature ok [00:42:14] person you have to learn this feature ok one more question and then we move on [00:42:20] yeah so good question why is it a [00:42:23] yeah so good question why is it a problem to stay it to stabilize at zero [00:42:26] problem to stay it to stabilize at zero is because it's common to ships and the [00:42:30] is because it's common to ships and the loss function is positive and in the [00:42:32] loss function is positive and in the paper that you can find its face net [00:42:33] paper that you can find its face net paper they don't train exactly this loss [00:42:35] paper they don't train exactly this loss they train the maximum of this loss and [00:42:37] they train the maximum of this loss and zero okay so you train and you get the [00:42:42] zero okay so you train and you get the right function now let's make the [00:42:44] right function now let's make the problem a little more complicated what [00:42:46] problem a little more complicated what we did so far was face verification [00:42:48] we did so far was face verification we're going to do face recognition [00:42:50] we're going to do face recognition what's the difference the difference is [00:42:52] what's the difference the difference is there is no more ID so now you just have [00:42:55] there is no more ID so now you just have a camera in the facility you enter the [00:42:58] a camera in the facility you enter the camera looks at you and finds you how [00:43:02] camera looks at you and finds you how would you design this new network [00:43:13] yes in the back [00:43:14] yes in the back you've added in an element now of [00:43:17] you've added in an element now of recognition as well because now before [00:43:20] recognition as well because now before you'd search stand in front of it and [00:43:21] you'd search stand in front of it and you that every picture had a face now it [00:43:23] you that every picture had a face now it needs to detect the face okay so you're [00:43:26] needs to detect the face okay so you're saying maybe we need to add an element [00:43:28] saying maybe we need to add an element to the pipeline that is the detection [00:43:30] to the pipeline that is the detection detection element that's true in general [00:43:32] detection element that's true in general for face recognition let's say you have [00:43:34] for face recognition let's say you have a picture that is quite big you want to [00:43:36] a picture that is quite big you want to use the first Network that identifies [00:43:38] use the first Network that identifies the face like finds it on the picture [00:43:40] the face like finds it on the picture detects it and then crop the face and [00:43:42] detects it and then crop the face and give it to another network that's true [00:43:44] give it to another network that's true that could also be used in verification [00:43:46] that could also be used in verification as well great so the difference may be [00:43:56] as well great so the difference may be weak and what you're saying is maybe we [00:43:58] weak and what you're saying is maybe we can use or verification algorithm that [00:44:00] can use or verification algorithm that you trained when instead of looking [00:44:02] you trained when instead of looking one-to-one comparison we look at 1 to n [00:44:05] one-to-one comparison we look at 1 to n comparison so we have the pictures of [00:44:08] comparison so we have the pictures of all the students in the database what we [00:44:10] all the students in the database what we can do is run all these data based [00:44:12] can do is run all these data based pictures in the model get a vector that [00:44:15] pictures in the model get a vector that represents them right to get the vectors [00:44:18] represents them right to get the vectors now you enter the facility we get your [00:44:22] now you enter the facility we get your picture we run it through the model we [00:44:24] picture we run it through the model we get your vector and we can compare this [00:44:25] get your vector and we can compare this vector to all the vectors in the [00:44:27] vector to all the vectors in the database to identify you what's the [00:44:30] database to identify you what's the complexity of this it's the number of [00:44:36] complexity of this it's the number of students you have for every prediction [00:44:39] students you have for every prediction to go over the whole database and a [00:44:41] to go over the whole database and a common network like model that you can [00:44:44] common network like model that you can use to do that is chain your neighbors [00:44:47] use to do that is chain your neighbors so of course if you have only one [00:44:49] so of course if you have only one picture per students it's not going to [00:44:51] picture per students it's not going to be very precise but if you collect three [00:44:53] be very precise but if you collect three pictures per student and you run a two [00:44:55] pictures per student and you run a two nearest neighbors algorithm you would [00:44:57] nearest neighbors algorithm you would decide that if the two pictures are the [00:44:59] decide that if the two pictures are the same it's likely that this person is the [00:45:01] same it's likely that this person is the same as the two person on the picture ok [00:45:06] same as the two person on the picture ok now let's make it a little more [00:45:09] now let's make it a little more complicated you probably saw that on [00:45:11] complicated you probably saw that on your on your phones sometimes you take a [00:45:15] your on your phones sometimes you take a picture and it recognizes that it's your [00:45:18] picture and it recognizes that it's your grandmother or your grandfather or your [00:45:20] grandmother or your grandfather or your mother and father [00:45:22] mother and father what's happening behind is that there is [00:45:24] what's happening behind is that there is some clustering happening it means we [00:45:27] some clustering happening it means we have a bunch of images and we want to [00:45:31] have a bunch of images and we want to cluster them together so this is also [00:45:34] cluster them together so this is also another algorithm that you seen here 2 [00:45:35] another algorithm that you seen here 2 to 9 and CF 2 to 9 a which is k-means [00:45:37] to 9 and CF 2 to 9 a which is k-means algorithm and this is a clustering [00:45:40] algorithm and this is a clustering algorithm by taking all the vectors that [00:45:42] algorithm by taking all the vectors that we have in the database we can find [00:45:45] we have in the database we can find let's say sorry you haven't you have a [00:45:47] let's say sorry you haven't you have a phone you have thousands of pictures of [00:45:50] phone you have thousands of pictures of let's say 20 different people what you [00:45:53] let's say 20 different people what you want is to cluster all the pictures of [00:45:55] want is to cluster all the pictures of the same person separately what you will [00:45:58] the same person separately what you will do is that you will encode all the [00:45:59] do is that you will encode all the pictures in vectors and then you will [00:46:01] pictures in vectors and then you will run a clustering algorithm like k-means [00:46:03] run a clustering algorithm like k-means in order to cluster those into groups [00:46:06] in order to cluster those into groups these are the vectors that look like [00:46:08] these are the vectors that look like each other these are the vectors that [00:46:09] each other these are the vectors that look like each other [00:46:10] look like each other ok and then you can simply give folders [00:46:13] ok and then you can simply give folders to the users with all the pictures of [00:46:14] to the users with all the pictures of your mom all the pictures of your dad [00:46:16] your mom all the pictures of your dad and so good question how do you define [00:46:24] and so good question how do you define the cake so someone has an idea actually [00:46:41] so one one way is to as you said to try [00:46:44] so one one way is to as you said to try different values trainer clustering [00:46:46] different values trainer clustering algorithm and look at a certain loss to [00:46:49] algorithm and look at a certain loss to define our small ADIZ there's actually [00:46:50] define our small ADIZ there's actually an algorithm called X means that is used [00:46:53] an algorithm called X means that is used X means we might search for that each [00:46:55] X means we might search for that each one to find to find the K there is also [00:46:59] one to find to find the K there is also a method called the elbow method and [00:47:01] a method called the elbow method and that you want to search for as well to [00:47:03] that you want to search for as well to deter grout the K okay and as you said [00:47:08] deter grout the K okay and as you said maybe we need to detect the face first [00:47:10] maybe we need to detect the face first and then crop and give it to the [00:47:11] and then crop and give it to the algorithm one more question on face [00:47:13] algorithm one more question on face verification so you can use the music [00:47:22] verification so you can use the music louder do you need to use the vector [00:47:32] louder do you need to use the vector that you trained for classification [00:47:37] sorry idea I don't understand so you [00:47:40] sorry idea I don't understand so you mean oh so where is the encoding coming [00:47:52] mean oh so where is the encoding coming from that's what you mean in the network [00:47:54] from that's what you mean in the network okay good question [00:47:56] okay good question so you have a deep network and you want [00:47:58] so you have a deep network and you want to decide where should you take the [00:47:59] to decide where should you take the encoding from in this case the more [00:48:02] encoding from in this case the more complex the task the deeper you would go [00:48:04] complex the task the deeper you would go but for face verification what you want [00:48:07] but for face verification what you want and you know it as a human you want to [00:48:08] and you know it as a human you want to know features like distance between eyes [00:48:11] know features like distance between eyes nose and stuff and so you have to go [00:48:13] nose and stuff and so you have to go deeper you need the first layers to [00:48:15] deeper you need the first layers to figure out the edges give the edges to [00:48:17] figure out the edges give the edges to the second layer the second layer to [00:48:19] the second layer the second layer to figure out the nose the eyes give it to [00:48:21] figure out the nose the eyes give it to the third layer the third layer to [00:48:22] the third layer the third layer to figure out the distances between the [00:48:23] figure out the distances between the eyes the distance in between years so [00:48:25] eyes the distance in between years so you would go deeper and get the encoding [00:48:27] you would go deeper and get the encoding deeper because you know that you want [00:48:29] deeper because you know that you want high level features okay [00:48:33] high level features okay our generation even a picture make it [00:48:38] our generation even a picture make it look beautiful as usual data what do we [00:48:44] look beautiful as usual data what do we need [00:48:48] it's a little complicated because we [00:48:50] it's a little complicated because we have to define what beautiful is so data [00:48:56] have to define what beautiful is so data some beautiful pictures I know maybe my [00:48:59] some beautiful pictures I know maybe my concept of beautifully defended they [00:49:06] concept of beautifully defended they timed a certain style let's go that's a [00:49:07] timed a certain style let's go that's a good point so we might say that [00:49:08] good point so we might say that beautiful means paintings like paintings [00:49:11] beautiful means paintings like paintings are usually beautiful so you wanna have [00:49:13] are usually beautiful so you wanna have a sigh kind of a style yeah that's true [00:49:15] a sigh kind of a style yeah that's true so let's say we have any data that we we [00:49:19] so let's say we have any data that we we want what we're going to do and the way [00:49:22] want what we're going to do and the way we define this problem is let's take an [00:49:25] we define this problem is let's take an image that we call the content image and [00:49:27] image that we call the content image and here again you have the Louvre Museum [00:49:29] here again you have the Louvre Museum and let's take an image that we call the [00:49:31] and let's take an image that we call the style image and this is a painting that [00:49:33] style image and this is a painting that we find beautiful what we want is to [00:49:37] we find beautiful what we want is to generate an image that looks like it's [00:49:42] generate an image that looks like it's the content of the content image but [00:49:45] the content of the content image but painted by the painter of the style so [00:49:48] painted by the painter of the style so this style image is a clone Monet and [00:49:50] this style image is a clone Monet and here we have the Louvre painted by [00:49:52] here we have the Louvre painted by Claude Monet even if he was dead when [00:49:55] Claude Monet even if he was dead when this pyramid was created so that's our [00:49:58] this pyramid was created so that's our goal and this is what we would call our [00:50:02] goal and this is what we would call our generation there are other methods but [00:50:04] generation there are other methods but this is one so how do we do that what [00:50:08] this is one so how do we do that what architectures do we need and please try [00:50:11] architectures do we need and please try to use what we've seen in the past two [00:50:12] to use what we've seen in the past two applications together what training [00:50:17] applications together what training scheme what application what what [00:50:19] scheme what application what what architecture [00:50:28] one wants to try [00:50:55] you're saying we take some spy images [00:50:59] you're saying we take some spy images give it as input to a network and the [00:51:02] give it as input to a network and the network outputs yes or no like one or [00:51:05] network outputs yes or no like one or zero generate we want to generate an [00:51:11] zero generate we want to generate an image probably so what you're proposing [00:51:24] image probably so what you're proposing is we get an image that is the content [00:51:28] is we get an image that is the content image and we have a network that is a [00:51:31] image and we have a network that is a style style network which will style [00:51:34] style style network which will style this image and we will get the content [00:51:36] this image and we will get the content but style version of the content so use [00:51:45] but style version of the content so use certain feature of this type and change [00:51:47] certain feature of this type and change this style according to what the network [00:51:49] this style according to what the network is not so this is actually done this is [00:51:51] is not so this is actually done this is one method that's not the one we'll see [00:51:53] one method that's not the one we'll see today but this method which is a small [00:51:56] today but this method which is a small issue is that you have to train your [00:51:58] issue is that you have to train your network to learn one style network [00:52:01] network to learn one style network learns one style you give the content it [00:52:03] learns one style you give the content it gives you the constant with the specific [00:52:04] gives you the constant with the specific style of the model what we want to do is [00:52:07] style of the model what we want to do is to have no model that is restricted to a [00:52:09] to have no model that is restricted to a specific style I want to be able to give [00:52:12] specific style I want to be able to give a painting of Picasso and get this [00:52:14] a painting of Picasso and get this picture painted by Picasso so the [00:52:18] picture painted by Picasso so the difference here is that we're not we're [00:52:20] difference here is that we're not we're not going to learn parameters of a [00:52:22] not going to learn parameters of a network like we did for face [00:52:23] network like we did for face verification or for the in a [00:52:25] verification or for the in a classification we're going to learn an [00:52:27] classification we're going to learn an image so remember when we talked about [00:52:30] image so remember when we talked about back propagation of the gradient to the [00:52:32] back propagation of the gradient to the parameters we're not going to do that [00:52:34] parameters we're not going to do that we're going to back propagate all the [00:52:36] we're going to back propagate all the way back to the image let's see how it [00:52:39] way back to the image let's see how it works so first we have to understand [00:52:43] works so first we have to understand what content means and what style means [00:52:44] what content means and what style means to do that we're going to use encoding [00:52:47] to do that we're going to use encoding we're going to to use the ideas that we [00:52:49] we're going to to use the ideas that we talked about later [00:52:50] talked about later giving the content image to a network [00:52:53] giving the content image to a network that is very good will allow us to [00:52:55] that is very good will allow us to extract some information about the [00:52:57] extract some information about the content of this image we specifically [00:53:00] content of this image we specifically sew together that earlier layers we [00:53:02] sew together that earlier layers we detect the edges the edges are usually a [00:53:05] detect the edges the edges are usually a good representation of the content [00:53:09] good representation of the content so I might have a very good Network give [00:53:12] so I might have a very good Network give my contents image extract the [00:53:14] my contents image extract the information from the first layer this [00:53:15] information from the first layer this information is going to be the content [00:53:17] information is going to be the content of the image now the question is how do [00:53:19] of the image now the question is how do I get the style I want to give my style [00:53:24] I get the style I want to give my style image and find a way to extract the [00:53:26] image and find a way to extract the style that's what we're going to learn [00:53:29] style that's what we're going to learn later in this course it's a technique [00:53:31] later in this course it's a technique called Graham matrix and the important [00:53:33] called Graham matrix and the important thing to remember is that the style is [00:53:35] thing to remember is that the style is non localized information if I show you [00:53:39] non localized information if I show you the pictures in the previous slide sorry [00:53:43] the pictures in the previous slide sorry here [00:53:45] here you see that in the generated picture [00:53:47] you see that in the generated picture although on the style image there was a [00:53:49] although on the style image there was a tree on the left side there is no tree [00:53:52] tree on the left side there is no tree on the generated image it means when I [00:53:55] on the generated image it means when I extracted the style I just extracted non [00:53:58] extracted the style I just extracted non localized information what's the [00:53:59] localized information what's the technique that Claude Monet has used to [00:54:01] technique that Claude Monet has used to paint I didn't want to extract this tree [00:54:03] paint I didn't want to extract this tree that was on the style image don't want a [00:54:06] that was on the style image don't want a content okay so we're going to take a [00:54:10] content okay so we're going to take a network that understands images very [00:54:12] network that understands images very well and they're common online you can [00:54:14] well and they're common online you can find image net classification networks [00:54:17] find image net classification networks online that were trained to recognize [00:54:19] online that were trained to recognize more than thousand thousands of objects [00:54:23] more than thousand thousands of objects this network is going to understand [00:54:26] this network is going to understand basically anything you give it if I give [00:54:28] basically anything you give it if I give it the Louvre Museum it's going to find [00:54:30] it the Louvre Museum it's going to find all the edges very easily it's going to [00:54:32] all the edges very easily it's going to figure out that there is it's during the [00:54:34] figure out that there is it's during the day it's going to figure out their [00:54:36] day it's going to figure out their buildings on the sides and all the [00:54:37] buildings on the sides and all the features of the image because it was [00:54:39] features of the image because it was trained for months on thousands of [00:54:41] trained for months on thousands of classes let's say we have this network [00:54:44] classes let's say we have this network we give our content image to it and we [00:54:46] we give our content image to it and we extract information from the first few [00:54:49] extract information from the first few layers this information we call it [00:54:51] layers this information we call it content see content of the content image [00:54:55] content see content of the content image does that make sense now I give the styl [00:54:59] does that make sense now I give the styl image and I will use another method that [00:55:01] image and I will use another method that is called the grain matrix to extract [00:55:03] is called the grain matrix to extract style s style of the style image okay [00:55:08] style s style of the style image okay and now the question is what should be [00:55:12] and now the question is what should be the loss function so let's go on menti [00:55:29] so same code as usual just open it if [00:55:40] so same code as usual just open it if you want to repeat you can repeat the [00:55:41] you want to repeat you can repeat the code if you want eight four five seven [00:55:43] code if you want eight four five seven zero nine and these are the three [00:55:46] zero nine and these are the three proposals for the last function so [00:55:49] proposals for the last function so reminder content C means content of the [00:55:52] reminder content C means content of the contents image style s means style of [00:55:54] contents image style s means style of the styl image style G means style of [00:55:58] the styl image style G means style of the generated image content G means [00:56:00] the generated image content G means content of the generated image take like [00:56:07] content of the generated image take like a minute it's too small on the code up [00:56:24] eight four five seven zero nine [00:56:57] what so just repeating the question why [00:57:00] what so just repeating the question why do we need to use imagenet because we we [00:57:03] do we need to use imagenet because we we don't really need to classify any image [00:57:05] don't really need to classify any image and it's gonna waste time the reason we [00:57:08] and it's gonna waste time the reason we need image net is because image net [00:57:10] need image net is because image net understands our pictures so if if you [00:57:12] understands our pictures so if if you give the contents image to a network [00:57:15] give the contents image to a network that doesn't understand pictures very [00:57:16] that doesn't understand pictures very well you're not going to get the edges [00:57:19] well you're not going to get the edges very well so you want a network that you [00:57:23] very well so you want a network that you don't care about the classification [00:57:24] don't care about the classification output you just cut the network in the [00:57:26] output you just cut the network in the middle extract the layers in the middle [00:57:28] middle extract the layers in the middle okay let's see what the answers are [00:57:31] okay let's see what the answers are according to you guys so yeah I repeat [00:57:41] according to you guys so yeah I repeat we're not training anything here we're [00:57:43] we're not training anything here we're getting a model that exists and we use [00:57:46] getting a model that exists and we use this model we're going to talk about the [00:57:48] this model we're going to talk about the training after okay someone who has [00:57:51] training after okay someone who has answered the second question and I will [00:57:53] answered the second question and I will read it out loud the loss is the l2 [00:57:56] read it out loud the loss is the l2 difference between the style of the [00:57:57] difference between the style of the style image and the generated style plus [00:58:00] style image and the generated style plus the l2 distance between the generate the [00:58:03] the l2 distance between the generate the generators content and the contents [00:58:05] generators content and the contents content yeah [00:58:16] so yeah we want to minimize both terms [00:58:19] so yeah we want to minimize both terms here so we want the content of the [00:58:22] here so we want the content of the content image to look like the content [00:58:23] content image to look like the content of the generated image so we want to [00:58:25] of the generated image so we want to minimize the L to this s of these two [00:58:27] minimize the L to this s of these two and the reason we use a plus is because [00:58:29] and the reason we use a plus is because we also want to minimize the difference [00:58:31] we also want to minimize the difference of styles between the generated in the [00:58:32] of styles between the generated in the style image so you see we don't have any [00:58:35] style image so you see we don't have any terms that says style of the content [00:58:37] terms that says style of the content image - style of the generated image is [00:58:40] image - style of the generated image is minimized this is the loss we want okay [00:58:46] minimized this is the loss we want okay up below okay so just going over the [00:58:53] up below okay so just going over the architecture again so the last function [00:58:56] architecture again so the last function we're going to use will be the one we [00:58:59] we're going to use will be the one we saw and so one thing that I want to [00:59:02] saw and so one thing that I want to emphasize here is we're not training the [00:59:04] emphasize here is we're not training the network there's no parameter that we [00:59:06] network there's no parameter that we trained the parameters are in the image [00:59:08] trained the parameters are in the image net classification network we use them [00:59:10] net classification network we use them we don't train them what we will train [00:59:12] we don't train them what we will train is the image so you get an image and you [00:59:15] is the image so you get an image and you start with white noise you run this [00:59:18] start with white noise you run this image through the classification network [00:59:20] image through the classification network but you don't care about the [00:59:22] but you don't care about the classification of this image image net [00:59:24] classification of this image image net is going to give a random class to this [00:59:25] is going to give a random class to this image totally random [00:59:28] image totally random instead you will extract content G and [00:59:32] instead you will extract content G and tile G okay so from this image you run [00:59:36] tile G okay so from this image you run it and you extract information from this [00:59:39] it and you extract information from this network using the same techniques that [00:59:40] network using the same techniques that you've used to extract content C and [00:59:43] you've used to extract content C and stylist so contents the N stylist you [00:59:45] stylist so contents the N stylist you have it you have it you able to compute [00:59:48] have it you have it you able to compute the last function because now you have [00:59:50] the last function because now you have the four terms of the class function you [00:59:53] the four terms of the class function you compute the derivatives instead of [00:59:55] compute the derivatives instead of stopping in the network you go all the [00:59:58] stopping in the network you go all the way back to the pixels of the image and [01:00:00] way back to the pixels of the image and you decide how much should I move the [01:00:02] you decide how much should I move the pixels in order to make this loss go [01:00:04] pixels in order to make this loss go down and you do that many times if you [01:00:07] down and you do that many times if you add many times and the more you do that [01:00:08] add many times and the more you do that the more this is going to look like the [01:00:11] the more this is going to look like the content of the content image and the [01:00:12] content of the content image and the style of the style image yeah yeah so [01:00:21] style of the style image yeah yeah so the downside of this network is although [01:00:24] the downside of this network is although it has the flexibility [01:00:25] it has the flexibility with any style any content every time [01:00:28] with any style any content every time you want to generate an image you have [01:00:29] you want to generate an image you have to do this training loop while the other [01:00:31] to do this training loop while the other network that you talked about doesn't [01:00:33] network that you talked about doesn't need that because the model is trained [01:00:34] need that because the model is trained to to convert the content to a style you [01:00:36] to to convert the content to a style you just give it which network you talked [01:00:46] just give it which network you talked about this network yeah so do we need to [01:00:48] about this network yeah so do we need to train this network on Mona images [01:00:50] train this network on Mona images usually not this network is trained on [01:00:53] usually not this network is trained on millions of images [01:00:54] millions of images it's basically seen everything you can [01:00:57] it's basically seen everything you can imagine what do you mean back propagate [01:01:06] imagine what do you mean back propagate properly here you're not training the [01:01:08] properly here you're not training the network you're giving this image [01:01:10] network you're giving this image computing the back propagation and going [01:01:12] computing the back propagation and going back to the image only updating the [01:01:14] back to the image only updating the image you don't update the network it [01:01:19] image you don't update the network it comes from contents en stylist it comes [01:01:21] comes from contents en stylist it comes from the stylist so the loss function [01:01:24] from the stylist so the loss function you bake the baseline is you have [01:01:26] you bake the baseline is you have content C and style s because you've [01:01:28] content C and style s because you've chosen a Content picture in a style [01:01:29] chosen a Content picture in a style picture and now every at every step you [01:01:32] picture and now every at every step you will find the new content G in style G [01:01:35] will find the new content G in style G back propagate updates give it again get [01:01:38] back propagate updates give it again get the new content G and style G update [01:01:40] the new content G and style G update again and so on no did the art never [01:01:45] again and so on no did the art never touch it [01:01:46] touch it just one time the arts image just [01:01:48] just one time the arts image just touches one time the neural network you [01:01:50] touches one time the neural network you can you extract style s and then that's [01:01:51] can you extract style s and then that's all you don't use it again ok let's do [01:01:54] all you don't use it again ok let's do one more question yeah good question why [01:02:00] one more question yeah good question why do you start with white nose instead of [01:02:02] do you start with white nose instead of the content or this time actually do you [01:02:04] the content or this time actually do you think it's better to start with the [01:02:05] think it's better to start with the content or this time probably the style [01:02:09] content or this time probably the style I think probably the content because the [01:02:13] I think probably the content because the the edges at least look like the content [01:02:15] the edges at least look like the content is going to to help the network converge [01:02:19] is going to to help the network converge quicker yeah that's true [01:02:20] quicker yeah that's true you don't have to start with white noise [01:02:21] you don't have to start with white noise in generally the baseline is start with [01:02:23] in generally the baseline is start with white noise so that anything can happen [01:02:25] white noise so that anything can happen if you give it the content to start with [01:02:27] if you give it the content to start with is going to have a bias towards the [01:02:28] is going to have a bias towards the content but if you train longer issues [01:02:31] content but if you train longer issues okay one more question and then we can [01:02:40] image doesn't understand what's content [01:02:42] image doesn't understand what's content and style but imagenet finds the edges [01:02:45] and style but imagenet finds the edges on the image and so you can give the [01:02:48] on the image and so you can give the contents image and extract the few first [01:02:49] contents image and extract the few first layers to get information about them [01:02:51] layers to get information about them because when it was trained on [01:02:53] because when it was trained on classification it needed to find the [01:02:55] classification it needed to find the edges to find that a dog is a dog you [01:02:58] edges to find that a dog is a dog you first need to find the edges of the log [01:02:59] first need to find the edges of the log so it's it's trying to do so and for the [01:03:02] so it's it's trying to do so and for the style it's complicated to understand the [01:03:05] style it's complicated to understand the style but the network finds all the [01:03:07] style but the network finds all the features on the image and then we use a [01:03:09] features on the image and then we use a post processing technique that is called [01:03:10] post processing technique that is called the Graham matrix in order to extract [01:03:12] the Graham matrix in order to extract what we call styler it's basically a [01:03:15] what we call styler it's basically a cross correlation of all the features of [01:03:17] cross correlation of all the features of the network we will learn it together [01:03:18] the network we will learn it together later okay let's move on to the next [01:03:23] later okay let's move on to the next application because we don't have too [01:03:24] application because we don't have too much time so this is the one I prefer [01:03:27] much time so this is the one I prefer given a 10 second audio speech detect [01:03:29] given a 10 second audio speech detect the word activate so you know we talked [01:03:31] the word activate so you know we talked about trigger word detection and there [01:03:33] about trigger word detection and there are many companies that have this wake [01:03:34] are many companies that have this wake word thing where you have a device at [01:03:36] word thing where you have a device at home and when you say a certain word it [01:03:38] home and when you say a certain word it activates itself so here's the same [01:03:40] activates itself so here's the same thing for the word activate what data do [01:03:42] thing for the word activate what data do we need do we need a lot or not probably [01:03:51] we need do we need a lot or not probably a lot because there are many accents and [01:03:53] a lot because there are many accents and one thing that is counterintuitive is [01:03:54] one thing that is counterintuitive is that if two humans like let's say let's [01:03:58] that if two humans like let's say let's say - two women speak as a human you [01:04:02] say - two women speak as a human you would say these voices are are pretty [01:04:05] would say these voices are are pretty similar right you can detect the word [01:04:08] similar right you can detect the word what the network's is is a list of [01:04:12] what the network's is is a list of numbers that are totally different from [01:04:13] numbers that are totally different from one person to another because the [01:04:16] one person to another because the frequencies we use in our voices are [01:04:17] frequencies we use in our voices are totally different from each other so the [01:04:19] totally different from each other so the numbers are very different although as a [01:04:21] numbers are very different although as a human we feel that it's very similar so [01:04:25] human we feel that it's very similar so we need a lot of ten-second audio clips [01:04:28] we need a lot of ten-second audio clips that's it what should be the [01:04:31] that's it what should be the distribution it should contain as many [01:04:33] distribution it should contain as many accents as you can as many female male [01:04:36] accents as you can as many female male voices kid adults and so on what should [01:04:41] voices kid adults and so on what should be the input of the network it should be [01:04:44] be the input of the network it should be a 10 sec [01:04:45] a 10 sec that we can represent like that the [01:04:47] that we can represent like that the 10-second audio clip is going to contain [01:04:49] 10-second audio clip is going to contain some positive words in green positive [01:04:52] some positive words in green positive word is activate and it's also going to [01:04:55] word is activate and it's also going to contain negative words in pink like [01:04:58] contain negative words in pink like kitchen lion whatever words that are not [01:05:03] kitchen lion whatever words that are not activated and we want only to detect the [01:05:05] activated and we want only to detect the positive word what should be the sample [01:05:08] positive word what should be the sample rate again same question you would test [01:05:11] rate again same question you would test on humans you would you would you would [01:05:14] on humans you would you would you would also talk to an expert in space regard [01:05:16] also talk to an expert in space regard mission to know what's the best sample [01:05:18] mission to know what's the best sample rate to use for speech processing what [01:05:22] rate to use for speech processing what should be the output any ideas [01:05:34] okay any other classification yes/no so [01:05:39] okay any other classification yes/no so 0 or 1 actually let's make your test [01:05:43] 0 or 1 actually let's make your test let's do it this so we have 3 audio [01:05:46] let's do it this so we have 3 audio speech here speech 1 speech to speech 3 [01:05:49] speech here speech 1 speech to speech 3 3 I don't know if we have this sound [01:05:51] 3 I don't know if we have this sound here do we have the sound [01:05:57] maybe we'll have it now okay let's try [01:06:06] Maria everybody in the last quality [01:06:08] Maria everybody in the last quality possibly unable to be manager so this is [01:06:11] possibly unable to be manager so this is labeled one nobody speaks Italian in the [01:06:18] labeled one nobody speaks Italian in the second one to the European team of the [01:06:24] second one to the European team of the apes' engine but then after a shopping [01:06:29] apes' engine but then after a shopping openness and a spacer [01:06:31] openness and a spacer okay what's the way quark has anybody [01:06:36] okay what's the way quark has anybody found what was the the trigger word we [01:06:40] found what was the the trigger word we need more so you know what's fun is this [01:06:45] need more so you know what's fun is this is a right scheme to label like it's [01:06:47] is a right scheme to label like it's definitely possible but it seems that [01:06:49] definitely possible but it seems that even for humans this labeling scheme is [01:06:51] even for humans this labeling scheme is super hard we're not able to find what's [01:06:54] super hard we're not able to find what's what's happening like I don't know even [01:06:56] what's happening like I don't know even if I did this slide I don't even [01:06:57] if I did this slide I don't even remember no today now let's try [01:07:01] remember no today now let's try something else [01:07:01] something else okay so now we have a different labeling [01:07:05] okay so now we have a different labeling scheme that tells us also where the wake [01:07:08] scheme that tells us also where the wake word is happening let's hear it again [01:07:12] word is happening let's hear it again Maria already in the last koala t [01:07:15] Maria already in the last koala t possibly label took manager look at our [01:07:20] possibly label took manager look at our culture somewhat cynically amici Esther [01:07:23] culture somewhat cynically amici Esther could you tell him I'm being pacified [01:07:27] could you tell him I'm being pacified with humility put in the shopping open a [01:07:31] with humility put in the shopping open a service pizza ok what's the trigger word [01:07:34] service pizza ok what's the trigger word for Murray Joe yeah Paul Mary Jo means [01:07:37] for Murray Joe yeah Paul Mary Jo means afternoon in Italian so you see what I'm [01:07:42] afternoon in Italian so you see what I'm trying to illustrate is compare the [01:07:46] trying to illustrate is compare the human to the computer and you will get [01:07:48] human to the computer and you will get what's the right labeling scheme to use [01:07:49] what's the right labeling scheme to use and of course the labeling scheme here [01:07:52] and of course the labeling scheme here is going to be better for the model [01:07:54] is going to be better for the model rather than the first one and we just [01:07:56] rather than the first one and we just proved it the important thing is to know [01:08:00] proved it the important thing is to know that the first one would also work we [01:08:02] that the first one would also work we just need a ton of data we need a lot [01:08:05] just need a ton of data we need a lot more data to make the first labeling [01:08:06] more data to make the first labeling scheme work than we need for the second [01:08:08] scheme work than we need for the second one does that make sense [01:08:11] one does that make sense so yeah we will use something like that [01:08:19] good question actually this is not the [01:08:22] good question actually this is not the best labeling scheme as you said should [01:08:25] best labeling scheme as you said should the one come before or after the word [01:08:27] the one come before or after the word was said what do you guys think [01:08:30] was said what do you guys think before after yeah you will see that [01:08:34] before after yeah you will see that recurrent neural networks are going [01:08:37] recurrent neural networks are going basically to look at the data just as [01:08:40] basically to look at the data just as human do like temporally from the [01:08:42] human do like temporally from the beginning to the end and in this case [01:08:44] beginning to the end and in this case you need to hear the word in order to [01:08:46] you need to hear the word in order to detect it so we're going to put the one [01:08:48] detect it so we're going to put the one right after the word was set another [01:08:50] right after the word was set another issue that we have with this is that [01:08:52] issue that we have with this is that there are too many zeros it's highly [01:08:54] there are too many zeros it's highly unbalanced so the network is pushed to [01:08:56] unbalanced so the network is pushed to always predict zeros so what we do as a [01:08:58] always predict zeros so what we do as a hack and there's a lot of hacks like [01:09:00] hack and there's a lot of hacks like that happening in papers if you read [01:09:02] that happening in papers if you read them we're going to add several ones [01:09:03] them we're going to add several ones after the word we'll say I would add [01:09:06] after the word we'll say I would add twenty ones basically okay so this is [01:09:10] twenty ones basically okay so this is our labeling scheme now what should be [01:09:13] our labeling scheme now what should be the last activation of our network [01:09:22] sigmoid function yeah sigmoid but [01:09:25] sigmoid function yeah sigmoid but sequential for every time step you would [01:09:28] sequential for every time step you would use a sigmoid to output 0 or 1 basically [01:09:31] use a sigmoid to output 0 or 1 basically don't worry if you don't understand [01:09:33] don't worry if you don't understand specifically what networks were using [01:09:35] specifically what networks were using you're going to learn it in a few weeks [01:09:37] you're going to learn it in a few weeks so the architecture should should be [01:09:39] so the architecture should should be like a recurrent neural network probably [01:09:42] like a recurrent neural network probably convolutional networks might work as [01:09:44] convolutional networks might work as well we'll see it later on in the course [01:09:47] well we'll see it later on in the course and the loss function should be the same [01:09:49] and the loss function should be the same as before but we should make it [01:09:50] as before but we should make it sequential for every time step we should [01:09:52] sequential for every time step we should use the loss function like that and we [01:09:54] use the loss function like that and we should sum them over all the time step [01:09:57] should sum them over all the time step sounds good so another insight on this [01:10:02] sounds good so another insight on this project I'll take it out is what was [01:10:05] project I'll take it out is what was critical to the success of this project [01:10:07] critical to the success of this project I think there are two things that are [01:10:08] I think there are two things that are really critical when you when you build [01:10:10] really critical when you when you build such a project the first one is to have [01:10:13] such a project the first one is to have a straight strategic data acquisition [01:10:16] a straight strategic data acquisition pipeline so let's talk more about that [01:10:19] pipeline so let's talk more about that we said that our data should be [01:10:21] we said that our data should be 10-second audio clips that contain [01:10:23] 10-second audio clips that contain positive and negative words from many [01:10:25] positive and negative words from many different accents [01:10:27] different accents how would you collect this data [01:10:38] right [01:10:42] yes you said you paid people to give you [01:10:50] yes you said you paid people to give you ten seconds of their voice but yes I [01:10:55] ten seconds of their voice but yes I think you you can take your phone go [01:10:57] think you you can take your phone go around campus and that's actually how we [01:10:59] around campus and that's actually how we did it we took our phones we went around [01:11:02] did it we took our phones we went around campus and we got some audio recordings [01:11:04] campus and we got some audio recordings so one way to do it is that's to go and [01:11:07] so one way to do it is that's to go and get ten second audio recordings from [01:11:09] get ten second audio recordings from different people with a large [01:11:11] different people with a large distribution of accents and then what do [01:11:13] distribution of accents and then what do you do you label you label by hand [01:11:16] you do you label you label by hand that's one method is it long or short is [01:11:20] that's one method is it long or short is it is it quick or not it's super slow [01:11:23] it is it quick or not it's super slow yeah oh [01:11:27] subtitles in movies alright that's a [01:11:29] subtitles in movies alright that's a good idea actually you could like based [01:11:32] good idea actually you could like based on the licensing of the movie you could [01:11:36] on the licensing of the movie you could like take an audio from a movie and you [01:11:39] like take an audio from a movie and you get the subtitles and you're looking for [01:11:41] get the subtitles and you're looking for activate and every time the subtitles [01:11:43] activate and every time the subtitles they activate you could label your data [01:11:45] they activate you could label your data that's super fun that's a pretty good [01:11:47] that's super fun that's a pretty good actually you could label automatically [01:11:49] actually you could label automatically using that yeah so that's a good idea I [01:11:52] using that yeah so that's a good idea I think there's another way to do it that [01:11:54] think there's another way to do it that is closer to that which is we're going [01:11:56] is closer to that which is we're going to collect three databases the first one [01:11:59] to collect three databases the first one is going to be the positive word [01:12:01] is going to be the positive word database the second one is going to be [01:12:03] database the second one is going to be the negative word database the third one [01:12:05] the negative word database the third one is going to be the background noise [01:12:07] is going to be the background noise database so I take the background ten [01:12:12] database so I take the background ten seconds I insert randomly from one two [01:12:16] seconds I insert randomly from one two three negative words and I insert [01:12:18] three negative words and I insert randomly from one two three positive [01:12:20] randomly from one two three positive words making sure it doesn't overlap [01:12:23] words making sure it doesn't overlap with a negative word okay [01:12:26] with a negative word okay what's the main advantage of this method [01:12:31] programmatic generation examples yeah [01:12:34] programmatic generation examples yeah programmatic generation of samples and [01:12:35] programmatic generation of samples and automated labeling I tend label I know [01:12:39] automated labeling I tend label I know where I inserted my positive words so I [01:12:42] where I inserted my positive words so I just add ones where I inserted it I can [01:12:45] just add ones where I inserted it I can generate millions of data examples like [01:12:47] generate millions of data examples like that just because I found the right [01:12:49] that just because I found the right strategy to to create data you see the [01:12:52] strategy to to create data you see the difference between the two methods the [01:12:53] difference between the two methods the one where you have to go out and [01:12:56] one where you have to go out and collect data and the one where you just [01:12:58] collect data and the one where you just go out collect positive words negative [01:13:01] go out collect positive words negative words and then find background noise on [01:13:03] words and then find background noise on YouTube or wherever you have the right [01:13:05] YouTube or wherever you have the right license to use it's it's a big [01:13:08] license to use it's it's a big difference and this can make can make a [01:13:11] difference and this can make can make a company succeed compared to another [01:13:12] company succeed compared to another company [01:13:13] company it's very common okay so I would go on [01:13:16] it's very common okay so I would go on campus take one second audio clips of [01:13:19] campus take one second audio clips of positive words put it in the database in [01:13:21] positive words put it in the database in green take one second audio clips of [01:13:23] green take one second audio clips of negative words of the same people as [01:13:25] negative words of the same people as well put it in the pink database and get [01:13:28] well put it in the pink database and get background noise from anywhere I can [01:13:29] background noise from anywhere I can find it it's very cheap and then create [01:13:31] find it it's very cheap and then create the synthetic data label it [01:13:33] the synthetic data label it automatically and you know with like [01:13:36] automatically and you know with like five plus Z words five negative words [01:13:39] five plus Z words five negative words five backgrounds you can create a lot of [01:13:41] five backgrounds you can create a lot of data points okay [01:13:45] data points okay so this is an important technique that [01:13:47] so this is an important technique that you might want to think about in your [01:13:48] you might want to think about in your project the second thing that is [01:13:51] project the second thing that is important for the success of such a [01:13:53] important for the success of such a project is the architecture search and [01:13:55] project is the architecture search and hyper parameter tuning so all of you you [01:13:58] hyper parameter tuning so all of you you will have complicated projects where you [01:14:01] will have complicated projects where you would be lost [01:14:02] would be lost regarding the inker architecture to use [01:14:05] regarding the inker architecture to use at first it's a complicated process to [01:14:08] at first it's a complicated process to find the architecture but you should not [01:14:10] find the architecture but you should not give up and the first thing I would say [01:14:11] give up and the first thing I would say is talk to the experts so let me tell [01:14:14] is talk to the experts so let me tell you the story of this project first I I [01:14:18] you the story of this project first I I started like looking at the literature [01:14:22] started like looking at the literature and figuring out what network I could [01:14:24] and figuring out what network I could use for this project and I ended up [01:14:26] use for this project and I ended up using that for the beginning part I use [01:14:28] using that for the beginning part I use a Fourier transform to extract features [01:14:30] a Fourier transform to extract features from the speech who's familiar with [01:14:32] from the speech who's familiar with spectrograms or Fourier transforms so [01:14:35] spectrograms or Fourier transforms so for the others think about audio speech [01:14:37] for the others think about audio speech as a 1d signal but every one this signal [01:14:40] as a 1d signal but every one this signal can be decomposed in a sum of sines and [01:14:43] can be decomposed in a sum of sines and cosines with a specific frequency and [01:14:45] cosines with a specific frequency and amplitude for each of these and so I can [01:14:48] amplitude for each of these and so I can convert a 1d signal into a matrix for [01:14:51] convert a 1d signal into a matrix for with with with basically [01:14:59] basically one axis that is the frequency [01:15:02] basically one axis that is the frequency one axis that is the time going from [01:15:06] one axis that is the time going from going from zero to ten seconds and I [01:15:11] going from zero to ten seconds and I will get the value of all the the [01:15:14] will get the value of all the the amplitude of this frequency so maybe [01:15:16] amplitude of this frequency so maybe this one is a strong frequency this one [01:15:18] this one is a strong frequency this one is a strong frequency this one is a low [01:15:20] is a strong frequency this one is a low one and so on for every time step this [01:15:22] one and so on for every time step this is a spectrogram of an audio speech [01:15:25] is a spectrogram of an audio speech you're going to learn a little bit more [01:15:26] you're going to learn a little bit more about that so after I got the [01:15:28] about that so after I got the spectrogram which is better than the 1d [01:15:29] spectrogram which is better than the 1d signal for the network I would use an [01:15:32] signal for the network I would use an LSD M which is a recurrent neural [01:15:34] LSD M which is a recurrent neural network and add a sigmoid layer after it [01:15:37] network and add a sigmoid layer after it to get probabilities between zero and [01:15:39] to get probabilities between zero and one I would threshold them everything be [01:15:42] one I would threshold them everything be more than 0.5 I would consider that it's [01:15:45] more than 0.5 I would consider that it's a 1 everything last to zero I tried for [01:15:48] a 1 everything last to zero I tried for a long time fitting this network on the [01:15:51] a long time fitting this network on the data it didn't work but one day I was [01:15:53] data it didn't work but one day I was working on campus and I I found a friend [01:15:58] working on campus and I I found a friend that was an expert in speech recognition [01:16:00] that was an expert in speech recognition he has worked a lot on all these [01:16:02] he has worked a lot on all these problems and he exactly knew that this [01:16:04] problems and he exactly knew that this was not going to work he could told me [01:16:05] was not going to work he could told me he could have told me so he told me [01:16:08] he could have told me so he told me there are several issues with this [01:16:10] there are several issues with this network the first one is your hyper [01:16:14] network the first one is your hyper parameters in the Fourier transform [01:16:15] parameters in the Fourier transform they're wrong go on my github you will [01:16:18] they're wrong go on my github you will find what hyper parameters are used for [01:16:20] find what hyper parameters are used for this Fourier transform you will find [01:16:21] this Fourier transform you will find specifically what sample rate what's [01:16:24] specifically what sample rate what's window size what frequencies are used so [01:16:27] window size what frequencies are used so that was better then he said one issue [01:16:29] that was better then he said one issue is that your record neural network is [01:16:32] is that your record neural network is too big it's super hard to train instead [01:16:34] too big it's super hard to train instead you should reduce it so I've used so he [01:16:37] you should reduce it so I've used so he told me to use a convolution to reduce [01:16:39] told me to use a convolution to reduce the number of time steps of my audio [01:16:41] the number of time steps of my audio clip you will learn about all these [01:16:42] clip you will learn about all these layers later and also use batch noir [01:16:46] layers later and also use batch noir which is a specific type of layer that [01:16:48] which is a specific type of layer that that makes the training easier and [01:16:50] that makes the training easier and finally you get your sigmoid layer and [01:16:53] finally you get your sigmoid layer and you output zeros and ones but because [01:16:56] you output zeros and ones but because the output time steps is smaller than [01:17:01] the output time steps is smaller than the input you have to expand it so you [01:17:03] the input you have to expand it so you need an expansion algorithm just a [01:17:05] need an expansion algorithm just a script that expands every zero in two [01:17:07] script that expands every zero in two zeros let's say every one in two ones [01:17:09] zeros let's say every one in two ones and so on and now I get another [01:17:11] and so on and now I get another architecture [01:17:12] architecture that I managed to train within a day and [01:17:14] that I managed to train within a day and this was all because I was lucky enough [01:17:17] this was all because I was lucky enough to find the experts and get advice from [01:17:21] to find the experts and get advice from this person so I think you will run into [01:17:23] this person so I think you will run into the same problems as I run into during [01:17:25] the same problems as I run into during your projects [01:17:26] your projects the important thing is spend more time [01:17:29] the important thing is spend more time figuring out who is the expert and who [01:17:30] figuring out who is the expert and who can tell you the answer rather than [01:17:32] can tell you the answer rather than trying out random things I think this is [01:17:35] trying out random things I think this is a an important thing to think about okay [01:17:39] a an important thing to think about okay so don't give up and also use error [01:17:42] so don't give up and also use error analysis which we're going to see later [01:17:44] analysis which we're going to see later we have two more minutes so I'm not [01:17:46] we have two more minutes so I'm not going to go over this one I'm just going [01:17:48] going to go over this one I'm just going to talk about it quickly there's another [01:17:49] to talk about it quickly there's another way to solve a word detection and the [01:17:52] way to solve a word detection and the other way is to use the triplet loss [01:17:54] other way is to use the triplet loss algorithm instead of using anchor [01:17:56] algorithm instead of using anchor positive and negative faces you can use [01:17:58] positive and negative faces you can use audio speech of one second anchor is the [01:18:01] audio speech of one second anchor is the word activate positive is the word [01:18:05] word activate positive is the word activate said differently and negative [01:18:07] activate said differently and negative is another word you will train your [01:18:10] is another word you will train your network to encode activate in a certain [01:18:13] network to encode activate in a certain vector and then compare the distance [01:18:16] vector and then compare the distance between vectors to figure out this [01:18:17] between vectors to figure out this activate is present or not okay we have [01:18:21] activate is present or not okay we have about two more minutes so I'm going to [01:18:27] my back [01:18:30] does that me so just to finish with two [01:18:36] does that me so just to finish with two more slides now that you've seen some [01:18:38] more slides now that you've seen some last function I want to show you another [01:18:40] last function I want to show you another one and I want you to tell me what [01:18:44] one and I want you to tell me what application does this beautiful Las [01:18:46] application does this beautiful Las correspond to this one of the most [01:18:49] correspond to this one of the most beautiful la sigh I've seen in my life [01:18:54] so someone can tell me what's the [01:18:57] so someone can tell me what's the application what problem are we trying [01:18:58] application what problem are we trying to solve if we use this loss function [01:19:07] speech recognition no it's not looking [01:19:09] speech recognition no it's not looking good try yes regression that's true it's [01:19:15] good try yes regression that's true it's a regression problem but it's a specific [01:19:17] a regression problem but it's a specific regression problem bounding box good [01:19:22] regression problem bounding box good bounding boxes object detection this is [01:19:24] bounding boxes object detection this is object detection so I put the paper here [01:19:27] object detection so I put the paper here you can check it out but how do you know [01:19:29] you can check it out but how do you know that it's subject detection oh you've [01:19:33] that it's subject detection oh you've done it before okay so this is the loss [01:19:39] done it before okay so this is the loss function of a network called Yolo and [01:19:42] function of a network called Yolo and the reason you can find out these [01:19:44] the reason you can find out these bounding boxes is because if you look at [01:19:45] bounding boxes is because if you look at the first term you would see that it's [01:19:48] the first term you would see that it's comparing X 2 through X predicted X 2 [01:19:51] comparing X 2 through X predicted X 2 print 2 through X predicted Y 2 true Y [01:19:54] print 2 through X predicted Y 2 true Y this is the center of a bounding box X Y [01:19:57] this is the center of a bounding box X Y second term is W and H w ni H stands for [01:20:01] second term is W and H w ni H stands for width and height of a bounding box and [01:20:03] width and height of a bounding box and it's trying to minimize the distance [01:20:06] it's trying to minimize the distance between the true bounding box and the [01:20:09] between the true bounding box and the predicted bounding box basically the [01:20:11] predicted bounding box basically the third term has an idle indicator [01:20:13] third term has an idle indicator function with objects it's saying if [01:20:15] function with objects it's saying if there is an object you should have a [01:20:17] there is an object you should have a high probability of object miss the [01:20:20] high probability of object miss the fourth term is saying that if there is [01:20:22] fourth term is saying that if there is no object you should have a lower [01:20:24] no object you should have a lower probability of object miss and finally [01:20:27] probability of object miss and finally the final term is telling you you have [01:20:29] the final term is telling you you have to find the class that is in this box is [01:20:31] to find the class that is in this box is it a cat is the dog is it an elephant is [01:20:34] it a cat is the dog is it an elephant is whatever so this is an object detection [01:20:36] whatever so this is an object detection loss function [01:20:38] loss function actually do you know why why you would [01:20:40] actually do you know why why you would have a square root here hmm [01:20:46] have a square root here hmm what's the TV know except for Dex [01:20:50] what's the TV know except for Dex the reason we have the square root is [01:20:52] the reason we have the square root is because if you want to penalize more [01:20:56] because if you want to penalize more errors on small bounding boxes rather [01:20:58] errors on small bounding boxes rather than big bounding boxes so if I give you [01:21:00] than big bounding boxes so if I give you an image of a human like that and [01:21:04] an image of a human like that and they're cats like this you can have so [01:21:09] they're cats like this you can have so this box the one inside is the ground [01:21:11] this box the one inside is the ground truth is a very tight box this one same [01:21:14] truth is a very tight box this one same and the box that are predicted or the [01:21:18] and the box that are predicted or the predictions so these are the predictions [01:21:19] predictions so these are the predictions and the other ones are the ground truth [01:21:22] and the other ones are the ground truth what's interesting is that a two pixel [01:21:24] what's interesting is that a two pixel error on these cats is much more [01:21:29] error on these cats is much more important than a two pixel error on this [01:21:31] important than a two pixel error on this human because the box is smaller so [01:21:33] human because the box is smaller so that's why you use a square root to [01:21:35] that's why you use a square root to penalize more the errors on small boxes [01:21:39] penalize more the errors on small boxes than on big boxes okay and finally the [01:21:42] than on big boxes okay and finally the final slide okay let's go over so just [01:21:46] final slide okay let's go over so just recalling what we have for next week you [01:21:49] recalling what we have for next week you have two modules to complete for next [01:21:51] have two modules to complete for next Wednesday which are c1 m3 with the [01:21:54] Wednesday which are c1 m3 with the following quiz and the following [01:21:56] following quiz and the following programming assignments c1 m4 with one [01:21:58] programming assignments c1 m4 with one quiz and true programming assignments [01:22:00] quiz and true programming assignments you're going to build your first deep [01:22:02] you're going to build your first deep neural network this is all going to be [01:22:04] neural network this is all going to be on the web it's already on the website [01:22:05] on the web it's already on the website and we'll publish the slides now you [01:22:08] and we'll publish the slides now you have ta project membership that is [01:22:10] have ta project membership that is mandatory this week so ta project [01:22:12] mandatory this week so ta project mentorships are mandatory this week to [01:22:15] mentorships are mandatory this week to start the week before the project [01:22:17] start the week before the project proposal the week before the project no [01:22:19] proposal the week before the project no after the proposal after the project [01:22:21] after the proposal after the project milestone and before the final project [01:22:22] milestone and before the final project submission okay and try ATS sections [01:22:26] submission okay and try ATS sections you're going to do some neural style [01:22:28] you're going to do some neural style transfer and art generation fill in the [01:22:31] transfer and art generation fill in the AWS form I don't know if it's been done [01:22:32] AWS form I don't know if it's been done yet we're going to try to give you some [01:22:35] yet we're going to try to give you some credits for your projects with GPUs ok [01:22:39] credits for your projects with GPUs ok thanks guys ================================================================================ LECTURE 003 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 3 - Full-Cycle Deep Learning Projects Source: https://www.youtube.com/watch?v=JUJNGv_sb4Y --- Transcript [00:00:05] all right Harry one okay I guess we're [00:00:08] all right Harry one okay I guess we're live so as Aarthi was saying please [00:00:12] live so as Aarthi was saying please enter your son at ID we can bring this [00:00:15] enter your son at ID we can bring this up again at the end of class today we're [00:00:18] up again at the end of class today we're just taking another like what 20 seconds [00:00:20] just taking another like what 20 seconds and then we'll we'll go onto the main [00:00:22] and then we'll we'll go onto the main discussion all right so um what I want [00:00:37] discussion all right so um what I want to discuss with you today is maybe what [00:00:41] to discuss with you today is maybe what I'm gonna call full cycle deep learning [00:00:43] I'm gonna call full cycle deep learning applications and so I think this Sunday [00:00:56] applications and so I think this Sunday you'll be submitting your proposals for [00:00:59] you'll be submitting your proposals for the class projects you do this quarter [00:01:01] the class projects you do this quarter and in most of the in a lot of what you [00:01:06] and in most of the in a lot of what you learn about the machine learning [00:01:07] learn about the machine learning projects you learn how to build machine [00:01:09] projects you learn how to build machine learning models what I want to do today [00:01:11] learning models what I want to do today is share view the bigger context of how [00:01:15] is share view the bigger context of how a machine learning model you know have a [00:01:17] a machine learning model you know have a neural network my train fits in the [00:01:20] neural network my train fits in the context of a bigger project so what are [00:01:23] context of a bigger project so what are all the steps right just as if you're [00:01:25] all the steps right just as if you're writing a software product you know you [00:01:27] writing a software product you know you take other classes and don't you know [00:01:29] take other classes and don't you know that that teach you how to build a [00:01:32] that that teach you how to build a website for example what is that [00:01:36] website for example what is that but Sybilla product requires more than [00:01:40] but Sybilla product requires more than just building a website right so what [00:01:42] just building a website right so what are the what are the other things you [00:01:43] are the what are the other things you need to do to actually do a successful [00:01:44] need to do to actually do a successful software project in this case to do a [00:01:46] software project in this case to do a successful machine learning application [00:01:49] successful machine learning application and so let's see so - oh yeah oh [00:01:58] and so let's see so - oh yeah oh test test is the only alarm test could [00:02:01] test test is the only alarm test could you turn up the audio [00:02:02] you turn up the audio No how's this nope can't hear me at all [00:02:06] No how's this nope can't hear me at all my oh I think I'm broadcasting I hear [00:02:09] my oh I think I'm broadcasting I hear myself great okay you can hear me now [00:02:14] myself great okay you can hear me now great thank you all right thank you all [00:02:17] great thank you all right thank you all right so one over this is share view a [00:02:19] right so one over this is share view a full cycle machine learning not just how [00:02:21] full cycle machine learning not just how to you learn a lot about how to build [00:02:24] to you learn a lot about how to build deep learning models but how does that [00:02:26] deep learning models but how does that fit in a bigger project right just as if [00:02:28] fit in a bigger project right just as if you're taking the claws on building a [00:02:30] you're taking the claws on building a website you know then great you know how [00:02:31] website you know then great you know how the code of a website that's really [00:02:33] the code of a website that's really valuable but what are all the things you [00:02:34] valuable but what are all the things you need to do to make a successful website [00:02:36] need to do to make a successful website to build a build a project that involves [00:02:38] to build a build a project that involves launching a website up mobile back or [00:02:40] launching a website up mobile back or whatever so as you plan for your class [00:02:45] whatever so as you plan for your class project proposals do to Sunday if you're [00:02:49] project proposals do to Sunday if you're doing an application project that fits [00:02:51] doing an application project that fits in the context of a bigger application [00:02:54] in the context of a bigger application also take some of these steps in mind [00:02:56] also take some of these steps in mind right so you know these are what I think [00:03:00] right so you know these are what I think of as a steps of an ml project oh really [00:03:04] of as a steps of an ml project oh really maybe maybe not class project but maybe [00:03:07] maybe maybe not class project but maybe you're serious machine learning [00:03:10] you're serious machine learning application right and I think oh no I've [00:03:13] application right and I think oh no I've built a lot of machine learning products [00:03:14] built a lot of machine learning products over several years so some of these are [00:03:17] over several years so some of these are also things that I wish I had known like [00:03:19] also things that I wish I had known like you know many years ago um one this was [00:03:25] you know many years ago um one this was kind of maybe kind of obvious but you [00:03:28] kind of maybe kind of obvious but you know select a problem and let's say for [00:03:31] know select a problem and let's say for the sake of simplicity that if you use [00:03:35] the sake of simplicity that if you use supervised learning right it turns out [00:03:38] supervised learning right it turns out for the CSU 30 class projects I think [00:03:40] for the CSU 30 class projects I think more than 50% of the projects tend to [00:03:43] more than 50% of the projects tend to use supervisor and then there are also [00:03:44] use supervisor and then there are also other projects that use end up using [00:03:46] other projects that use end up using gans which talked about later this [00:03:48] gans which talked about later this quarter or all the things but I think [00:03:49] quarter or all the things but I think you know let's say you supervised [00:03:51] you know let's say you supervised learning to build a machine application [00:03:53] learning to build a machine application and oh and I think for today I'm gonna [00:03:57] and oh and I think for today I'm gonna use as a running example building a [00:04:01] use as a running example building a building a voice-activated device right [00:04:04] building a voice-activated device right so you know I don't know actually how [00:04:07] so you know I don't know actually how many of you have like a smart speaker in [00:04:09] many of you have like a smart speaker in your home like a voice-activated device [00:04:11] your home like a voice-activated device in your home you know there were in the [00:04:13] in your home you know there were in the u.s. well not that many of you it just [00:04:15] u.s. well not that many of you it just okay cool yeah so I think you know the [00:04:18] okay cool yeah so I think you know the the Amazon echoes google homes the apple [00:04:20] the Amazon echoes google homes the apple series or the the in China might one of [00:04:24] series or the the in China might one of my form of tea is built by two 200s but [00:04:28] my form of tea is built by two 200s but let's say for the sake of argument that [00:04:30] let's say for the sake of argument that you want to build a voice-activated [00:04:32] you want to build a voice-activated device and I'm going to use as a running [00:04:34] device and I'm going to use as a running example and so in order to build a [00:04:38] example and so in order to build a voice-activated advice and again I'm not [00:04:40] voice-activated advice and again I'm not gonna use any of the commercial brands [00:04:42] gonna use any of the commercial brands like Alexa okay Google or hey Suri or I [00:04:45] like Alexa okay Google or hey Suri or I guess in China was a hello sale dude [00:04:47] guess in China was a hello sale dude won't sell do anyhow which means kind of [00:04:48] won't sell do anyhow which means kind of roughly how long I don't do um but let's [00:04:51] roughly how long I don't do um but let's use a more neutral word which is less [00:04:53] use a more neutral word which is less you wanted to build a device that your [00:04:55] you wanted to build a device that your response to that will activate and [00:04:57] response to that will activate and you're actually gonna implement this as [00:04:58] you're actually gonna implement this as a problem set later [00:04:59] a problem set later this quarter but so you want to build a [00:05:03] this quarter but so you want to build a yeah okay no volume publish let's see [00:05:12] yeah okay no volume publish let's see how they okay is this better [00:05:14] how they okay is this better no yes yeah this is better okay cool [00:05:17] no yes yeah this is better okay cool thank you no but ironic don't have [00:05:19] thank you no but ironic don't have speech recognition and the volume is [00:05:21] speech recognition and the volume is higher okay um so let's say you want a [00:05:24] higher okay um so let's say you want a WoW is it let me know if I can suffocate [00:05:27] WoW is it let me know if I can suffocate it thank you um so let's you want to [00:05:30] it thank you um so let's you want to build a voice-activated device so the [00:05:32] build a voice-activated device so the key components the key machine learning [00:05:34] key components the key machine learning deep learning component is going to be a [00:05:37] deep learning component is going to be a learning algorithm that takes us input [00:05:39] learning algorithm that takes us input and audio clip and outputs to the detect [00:05:48] and audio clip and outputs to the detect what's sometimes called the trigger word [00:05:53] yeah did I go soft again okay this'll be [00:05:56] yeah did I go soft again okay this'll be great alright and and O+ why you know [00:05:58] great alright and and O+ why you know zero one did you to check their trigger [00:06:01] zero one did you to check their trigger word such as a lexer or okay google or [00:06:03] word such as a lexer or okay google or history or a hollow little do or or [00:06:07] history or a hollow little do or or activate on whatever wake where they'll [00:06:09] activate on whatever wake where they'll trigger word right um and so [00:06:16] step one is select a problem and then in [00:06:22] step one is select a problem and then in order to train a learning algorithm you [00:06:25] order to train a learning algorithm you need to get labeled data if you apply [00:06:27] need to get labeled data if you apply supervised learning and then you design [00:06:33] supervised learning and then you design a model use backdrop or some of the [00:06:41] a model use backdrop or some of the other albums you learn about momentum [00:06:42] other albums you learn about momentum atom various optimization algorithms [00:06:45] atom various optimization algorithms gradient descent to train the model and [00:06:50] gradient descent to train the model and then maybe you test it on your test set [00:06:54] and then you deploy it meaning you start [00:06:58] and then you deploy it meaning you start selling these smart speakers and you [00:07:00] selling these smart speakers and you know putting them into hopefully until [00:07:02] know putting them into hopefully until you uses homes and then you have to [00:07:10] you uses homes and then you have to maintain the system I'll talk about this [00:07:12] maintain the system I'll talk about this later as well and and this is not [00:07:15] later as well and and this is not chronological but one thing that's often [00:07:17] chronological but one thing that's often done but I want to talk about it at the [00:07:19] done but I want to talk about it at the end instead it's not really step 8 is [00:07:21] end instead it's not really step 8 is IQA which is a quality assurance which [00:07:24] IQA which is a quality assurance which is an ongoing process right and so one [00:07:29] is an ongoing process right and so one let's see [00:07:31] let's see so as you so if you want to build a [00:07:33] so as you so if you want to build a product if you want to sell a machine [00:07:34] product if you want to sell a machine there any product these are maybe some [00:07:36] there any product these are maybe some of the key steps you need to work on [00:07:39] of the key steps you need to work on some observations when you train them [00:07:41] some observations when you train them although training them all is often a [00:07:43] although training them all is often a very iterative process so every time you [00:07:45] very iterative process so every time you train the machine there in your model [00:07:46] train the machine there in your model you find that you know I can almost [00:07:49] you find that you know I can almost guarantee whatever you do it will not [00:07:51] guarantee whatever you do it will not work at least not the first time right [00:07:54] work at least not the first time right and so you find that even though I've [00:07:56] and so you find that even though I've written is a sequence of steps when you [00:07:58] written is a sequence of steps when you train them although you're on the go [00:07:59] train them although you're on the go note that neural network architecture [00:08:01] note that neural network architecture didn't work I need to increase the [00:08:03] didn't work I need to increase the number of hidden units or change the [00:08:04] number of hidden units or change the regularization or switch there are n N [00:08:07] regularization or switch there are n N or switch to a totally different [00:08:08] or switch to a totally different architecture and sometimes you train [00:08:10] architecture and sometimes you train them all and go nope that didn't work I [00:08:12] them all and go nope that didn't work I need to get more data right and so this [00:08:16] need to get more data right and so this is often a very iterative process where [00:08:18] is often a very iterative process where you're cycling through [00:08:19] you're cycling through oh there's several different steps here [00:08:23] oh there's several different steps here and then I think one distinction that [00:08:26] and then I think one distinction that you have not yet learned about in the [00:08:28] you have not yet learned about in the Coursera in the d-plan Dalia kosair [00:08:30] Coursera in the d-plan Dalia kosair videos is how to split up the data into [00:08:32] videos is how to split up the data into train dev and test so I'm going to [00:08:34] train dev and test so I'm going to simplify those details for now but just [00:08:37] simplify those details for now but just as a foreshadowing you guys know what oh [00:08:40] as a foreshadowing you guys know what oh you learn later in the in the deep end [00:08:44] you learn later in the in the deep end ie i conserve videos is how to take a [00:08:46] ie i conserve videos is how to take a dataset you have training to use my [00:08:48] dataset you have training to use my entire training set into a set that you [00:08:52] entire training set into a set that you actually test cross validate using [00:08:54] actually test cross validate using during development called a deficit or [00:08:57] during development called a deficit or development set or hold our [00:08:58] development set or hold our cross-validation set that's what's a [00:08:59] cross-validation set that's what's a separate test set so you learn about [00:09:01] separate test set so you learn about this later but I'm just simplifying a [00:09:02] this later but I'm just simplifying a little bit for today okay so um so I [00:09:10] little bit for today okay so um so I think the first thing I want to do is [00:09:14] think the first thing I want to do is ask you a question right so we're gonna [00:09:17] ask you a question right so we're gonna talk through many of these steps so and [00:09:19] talk through many of these steps so and it turns out that what a lot of machine [00:09:21] it turns out that what a lot of machine learning classes do and do a good job [00:09:23] learning classes do and do a good job teaching is focusing on maybe these [00:09:26] teaching is focusing on maybe these three steps or maybe these four steps [00:09:29] three steps or maybe these four steps right and what I want to do today is [00:09:31] right and what I want to do today is spend more time so this is the heart of [00:09:34] spend more time so this is the heart of machine there I mean how do you build a [00:09:35] machine there I mean how do you build a green model and what I want to do today [00:09:38] green model and what I want to do today is spend more time talking about step [00:09:40] is spend more time talking about step one and six and seven and then just a [00:09:43] one and six and seven and then just a little bit of time talking about that [00:09:44] little bit of time talking about that call this because you kind of need to do [00:09:46] call this because you kind of need to do the other steps was well if you wanna [00:09:47] the other steps was well if you wanna throw the deep learning product or build [00:09:48] throw the deep learning product or build a machine learning application okay um [00:09:51] a machine learning application okay um so let's sort of a discussion question [00:09:53] so let's sort of a discussion question um I'm actually curious if you are [00:09:58] um I'm actually curious if you are selecting a project to work on [00:10:05] selecting a project to work on what other actually so I don't don't [00:10:08] what other actually so I don't don't answer this yet I'll tell you what the [00:10:09] answer this yet I'll tell you what the question I'm going to ask is which is [00:10:15] alright uh what properties make for a [00:10:18] alright uh what properties make for a good candidate deep learning project but [00:10:20] good candidate deep learning project but don't answer yet right now I want to say [00:10:22] don't answer yet right now I want to say a few more things before before I invite [00:10:24] a few more things before before I invite you to answer which is that all of you [00:10:26] you to answer which is that all of you for the last few days I hope I've been [00:10:28] for the last few days I hope I've been thinking about what price you want to do [00:10:29] thinking about what price you want to do for this cause and what I want to do is [00:10:31] for this cause and what I want to do is just discuss some properties of what a [00:10:33] just discuss some properties of what a good project to work on and what are [00:10:35] good project to work on and what are maybe not good practice and where [00:10:37] maybe not good practice and where okay and and think of this as your [00:10:39] okay and and think of this as your chance to give your classmates advice [00:10:40] chance to give your classmates advice right one of the things your cosplay [00:10:42] right one of the things your cosplay should think about the challenge decides [00:10:43] should think about the challenge decides is a good price to work on okay um and [00:10:46] is a good price to work on okay um and so what I want to do for today is use [00:10:50] so what I want to do for today is use this voice-activated thing as as long as [00:10:53] this voice-activated thing as as long as it most an example and you know there's [00:10:56] it most an example and you know there's actually one project I was working on [00:11:00] actually one project I was working on actually a father where actually there's [00:11:02] actually a father where actually there's one project I thought of working on but [00:11:03] one project I thought of working on but decided not to work on and that there's [00:11:07] decided not to work on and that there's a voice-activated device so it turns out [00:11:09] a voice-activated device so it turns out that um these voice-activated devices [00:11:12] that um these voice-activated devices and echo Google homes and so on they are [00:11:14] and echo Google homes and so on they are taking off quite rapidly in the US and [00:11:16] taking off quite rapidly in the US and around the world um it turns out that [00:11:18] around the world um it turns out that one of the you know significant pain [00:11:20] one of the you know significant pain points of these devices is the need to [00:11:23] points of these devices is the need to configure it right to set it up for [00:11:25] configure it right to set it up for Wi-Fi so I've done a lot of work on [00:11:28] Wi-Fi so I've done a lot of work on speech recognition you know a hotel de [00:11:30] speech recognition you know a hotel de la working on speech system [00:11:32] la working on speech system I let the by to speech system so I've [00:11:34] I let the by to speech system so I've been published papers on speech [00:11:35] been published papers on speech recognition and I have a I have one of [00:11:37] recognition and I have a I have one of these devices in my home right actually [00:11:40] these devices in my home right actually I have an Amazon echo in my innovative [00:11:42] I have an Amazon echo in my innovative but even to this day I have configured [00:11:45] but even to this day I have configured exactly one light bulb to be hooked up [00:11:48] exactly one light bulb to be hooked up to be controlled by my echo because the [00:11:50] to be controlled by my echo because the the set up process not blaming any [00:11:52] the set up process not blaming any country is just difficult to hook up you [00:11:55] country is just difficult to hook up you know a Wi-Fi enabled light bomb and then [00:11:58] know a Wi-Fi enabled light bomb and then to set it up so that your small speaker [00:12:01] to set it up so that your small speaker or whatever as in say you know smart [00:12:03] or whatever as in say you know smart device turn off the lamp so I have one [00:12:05] device turn off the lamp so I have one light bulb in my living room right that [00:12:07] light bulb in my living room right that I can turn on and off and that's it [00:12:09] I can turn on and off and that's it right even as a speech researcher so [00:12:14] um we must have valleys um so one one [00:12:19] um we must have valleys um so one one one application that I think that [00:12:21] one application that I think that actually sir sequencing working on is to [00:12:24] actually sir sequencing working on is to build a embedded device that you can [00:12:27] build a embedded device that you can sell to lamp makers so that I don't know [00:12:30] sell to lamp makers so that I don't know where you buy an ounce from and you know [00:12:32] where you buy an ounce from and you know have a few Lancer Mike here if you lands [00:12:34] have a few Lancer Mike here if you lands or wherever but you can buy a desk lamp [00:12:36] or wherever but you can buy a desk lamp so that when you buy the desk lamp [00:12:39] so that when you buy the desk lamp there's already a built-in microphone so [00:12:41] there's already a built-in microphone so that without needing to connect this [00:12:43] that without needing to connect this thing to Wi-Fi you know as I hey here's [00:12:45] thing to Wi-Fi you know as I hey here's a twenty dollar desk lamp put them on [00:12:48] a twenty dollar desk lamp put them on your desk and you can go home and say [00:12:50] your desk and you can go home and say desk lamp turn on or just lamp turn off [00:12:53] desk lamp turn on or just lamp turn off then I think that will help a lot more [00:12:56] then I think that will help a lot more users get voice activated devices into [00:12:59] users get voice activated devices into their home and it's actually not clear [00:13:00] their home and it's actually not clear to me if you want to turn on a desk lamp [00:13:02] to me if you want to turn on a desk lamp is actually not clear to me that you [00:13:04] is actually not clear to me that you want to turn to small speaker and say [00:13:06] want to turn to small speaker and say hey smart speaker please turn on that [00:13:08] hey smart speaker please turn on that lamp over there it may be a fuse one [00:13:11] lamp over there it may be a fuse one actor they just talk directly to a desk [00:13:13] actor they just talk directly to a desk lamp and tow it to turn on the turn and [00:13:16] lamp and tow it to turn on the turn and so so far well is where if someone [00:13:19] so so far well is where if someone friends and I we evaluated this we [00:13:20] friends and I we evaluated this we actually thought that this could be a [00:13:21] actually thought that this could be a reasonable business to build embedded [00:13:24] reasonable business to build embedded devices to sell to lamp makers or other [00:13:27] devices to sell to lamp makers or other device makers so that they can sell [00:13:28] device makers so that they can sell their own voice-activated devices [00:13:30] their own voice-activated devices without needing just complicated Wi-Fi [00:13:32] without needing just complicated Wi-Fi setup process and so to do this you [00:13:35] setup process and so to do this you would need to build a learning algorithm [00:13:36] would need to build a learning algorithm and have it Raman invent a device that [00:13:39] and have it Raman invent a device that just inputs an audio clip and outputs [00:13:41] just inputs an audio clip and outputs you know whenever it detects the wave [00:13:44] you know whenever it detects the wave word and instead of a wake where being [00:13:45] word and instead of a wake where being activated the week where it would be a [00:13:47] activated the week where it would be a lamp turned on or lamp turn off you need [00:13:50] lamp turned on or lamp turn off you need to wake where so trigger words want to [00:13:52] to wake where so trigger words want to turn it on when to turn it off right oh [00:13:54] turn it on when to turn it off right oh and and and I think just the other thing [00:13:56] and and and I think just the other thing that I think would make this work is to [00:14:01] that I think would make this work is to to give these devices names so if you [00:14:04] to give these devices names so if you have five lamps or two lambs you you [00:14:06] have five lamps or two lambs you you need an way to index into these [00:14:07] need an way to index into these different desk lamps so let's say you [00:14:10] different desk lamps so let's say you decide for your project you know to have [00:14:12] decide for your project you know to have a little switch here so this lamp could [00:14:14] a little switch here so this lamp could be called John or Mary or Bob or Alice [00:14:19] be called John or Mary or Bob or Alice like a four-way switch so that depending [00:14:21] like a four-way switch so that depending on where you set this four-way switch [00:14:22] on where you set this four-way switch you can say you know John [00:14:25] you can say you know John turn oh right always if you decide to [00:14:28] turn oh right always if you decide to call this lamb John I girls would give [00:14:30] call this lamb John I girls would give us some of the names so you don't have [00:14:31] us some of the names so you don't have every lamp by the same name okay um so [00:14:34] every lamp by the same name okay um so what I'm gonna do is use as a motivating [00:14:38] what I'm gonna do is use as a motivating example this as a possible project oh [00:14:41] example this as a possible project oh and I'm not working on this if any of [00:14:43] and I'm not working on this if any of you want to be able to start up doing [00:14:44] you want to be able to start up doing this go for it this is not know I felt [00:14:48] this go for it this is not know I felt my team's night way better ideas so we [00:14:50] my team's night way better ideas so we want to do other things in this by I [00:14:51] want to do other things in this by I should don't see anything wrong with [00:14:52] should don't see anything wrong with this I think there's actually could be a [00:14:53] this I think there's actually could be a reasonable thing to pursue yourself and [00:14:55] reasonable thing to pursue yourself and I'm not doing it so yeah very welcome to [00:14:57] I'm not doing it so yeah very welcome to if you want okay so now the question [00:15:00] if you want okay so now the question that one opposed to you is when you're [00:15:03] that one opposed to you is when you're brainstorming project ideas you know [00:15:04] brainstorming project ideas you know like this idea some other idea um what [00:15:08] like this idea some other idea um what are the things you would want to watch [00:15:09] are the things you would want to watch out for well what are the properties [00:15:11] out for well what are the properties that you have want to be true in order [00:15:13] that you have want to be true in order for you to few good proposing this as a [00:15:15] for you to few good proposing this as a as a CSU 30 project right so why should [00:15:18] as a CSU 30 project right so why should take a minute and write this down I [00:15:20] take a minute and write this down I think uh yeah what if you're asking your [00:15:24] think uh yeah what if you're asking your friend if a friend is asking you what [00:15:26] friend if a friend is asking you what are the things I should look at to see [00:15:28] are the things I should look at to see if something is a big project [00:15:29] if something is a big project what would you why you recommend to them [00:15:32] what would you why you recommend to them so feel free just write down a few key [00:15:33] so feel free just write down a few key words and then we'll see what people say [00:15:35] words and then we'll see what people say and then and then I'll tell you what I [00:15:38] and then and then I'll tell you what I tend to look out for when I'm selecting [00:15:40] tend to look out for when I'm selecting projects and I have a list of five [00:15:43] projects and I have a list of five points my stick like I don't know like [00:15:53] points my stick like I don't know like two minutes - oh sorry this is not [00:15:57] two minutes - oh sorry this is not activated you're not able to answer is [00:16:03] activated you're not able to answer is up enter SS okay [00:16:13] just checking on yeah BAM connect to the [00:16:15] just checking on yeah BAM connect to the internet RT any ideas oh I see okay [00:16:22] internet RT any ideas oh I see okay all right let me try that it's what's [00:16:29] all right let me try that it's what's working now okay thank you yes thank you [00:17:10] so you take like two minutes to enter [00:17:13] so you take like two minutes to enter and I think I think I can figure this [00:17:15] and I think I think I can figure this and let you enter multiple answers [00:17:16] and let you enter multiple answers mistake 2 News [00:18:07] all right another one minute 30 seconds [00:19:04] okay three two one [00:19:10] well maybe in hindsight that wasn't the [00:19:13] well maybe in hindsight that wasn't the best visualization can people see this [00:19:38] well than one trying to see if all right [00:19:41] well than one trying to see if all right so detail in novels he lost his data [00:19:44] so detail in novels he lost his data some of these re small human doable [00:19:46] some of these re small human doable number of examples during two months no [00:19:49] number of examples during two months no office you industrial fields clear [00:19:53] office you industrial fields clear objective practical useful oh ok finish [00:19:59] objective practical useful oh ok finish in time he'll stroll life problem useful [00:20:06] in time he'll stroll life problem useful hasn't been done computationally [00:20:08] hasn't been done computationally tractable yeah [00:20:10] tractable yeah generalization see cool great oh let me [00:20:16] generalization see cool great oh let me make some comments on fees I think I [00:20:18] make some comments on fees I think I this is this is pretty good um I had a [00:20:21] this is this is pretty good um I had a list of five bullet points that maybe I [00:20:23] list of five bullet points that maybe I just share view my list of five which is [00:20:27] just share view my list of five which is I mean just some things I encourage you [00:20:28] I mean just some things I encourage you to pay attention to well you know just [00:20:32] to pay attention to well you know just this may or may not be the best criteria [00:20:34] this may or may not be the best criteria but interests I think interest just it [00:20:37] but interests I think interest just it doesn't hopefully you'd work on [00:20:38] doesn't hopefully you'd work on something that you actually interested [00:20:39] something that you actually interested in um and then I think right data [00:20:45] in um and then I think right data availability which many of you cited is [00:20:48] availability which many of you cited is a good criteria or one of the ways that [00:20:50] a good criteria or one of the ways that Stanford class projects sometimes do not [00:20:53] Stanford class projects sometimes do not go well is if students spend a month to [00:20:55] go well is if students spend a month to try to collect data and after month I've [00:20:57] try to collect data and after month I've not yet found it in your car again and [00:20:59] not yet found it in your car again and then you know and then it's and there's [00:21:02] then you know and then it's and there's a lot of waste of time um one thing that [00:21:06] a lot of waste of time um one thing that I would encourage you to consider as [00:21:13] I would encourage you to consider as well is domain knowledge um and I think [00:21:17] well is domain knowledge um and I think that if you are a biologist and have [00:21:20] that if you are a biologist and have unique knowledge into some aspect of [00:21:22] unique knowledge into some aspect of biology to which you want to apply [00:21:23] biology to which you want to apply machine learning that will actually let [00:21:25] machine learning that will actually let you do is very interesting project right [00:21:27] you do is very interesting project right that is actually difficult for others to [00:21:30] that is actually difficult for others to do um and I think more generally as [00:21:34] do um and I think more generally as advice for navigating your careers right [00:21:36] advice for navigating your careers right so yeah this is a gene because AI [00:21:38] so yeah this is a gene because AI machine learning deep learning there's [00:21:39] machine learning deep learning there's so much there's so many people wanting [00:21:41] so much there's so many people wanting to jump into machine learning and deep [00:21:43] to jump into machine learning and deep learning actually giving example so I [00:21:46] learning actually giving example so I sometimes talk to doctors near radiology [00:21:48] sometimes talk to doctors near radiology students including [00:21:51] students including Stanford and other universities [00:21:52] Stanford and other universities recognizes students that want to learn [00:21:55] recognizes students that want to learn about machine learning right because [00:21:56] about machine learning right because they hear about you know deep learning [00:21:58] they hear about you know deep learning maybe someday affecting radiolysis jobs [00:22:00] maybe someday affecting radiolysis jobs and so they want to be part of deep [00:22:01] and so they want to be part of deep learning and so my career advice to them [00:22:04] learning and so my career advice to them is usually to not forget everything they [00:22:08] is usually to not forget everything they learned as a doctor and try to you know [00:22:10] learned as a doctor and try to you know do machine learning 101 from scratch and [00:22:12] do machine learning 101 from scratch and just forget everything they learn as a [00:22:14] just forget everything they learn as a doctor and just become a CS major I [00:22:16] doctor and just become a CS major I think that that path can work but I [00:22:18] think that that path can work but I think where radiologists could do the [00:22:20] think where radiologists could do the most unique work that that allows them [00:22:23] most unique work that that allows them to make the most unique contribution is [00:22:25] to make the most unique contribution is that they use their domain knowledge of [00:22:26] that they use their domain knowledge of healthcare radiology and do something in [00:22:29] healthcare radiology and do something in machine learning applied to radiology [00:22:30] machine learning applied to radiology right and so all right how many [00:22:38] right and so all right how many Millennials are there in this class what [00:22:42] Millennials are there in this class what does that mean me me anything all right [00:22:50] this is really wrong yeah I I think it's [00:22:55] this is really wrong yeah I I think it's because a word cloud so everybody counts [00:22:56] because a word cloud so everybody counts where frequency right the money thing I [00:22:59] where frequency right the money thing I don't know have very mixed feelings [00:23:00] don't know have very mixed feelings about that [00:23:04] all right um but I think actually fir I [00:23:07] all right um but I think actually fir I actually know that some of you are [00:23:08] actually know that some of you are taking you know deep learning because [00:23:10] taking you know deep learning because you work on a different discipline and [00:23:11] you work on a different discipline and you want to do something and this hot [00:23:14] you want to do something and this hot new exciting thing of machine learning [00:23:15] new exciting thing of machine learning and I think whatever discipline you're [00:23:17] and I think whatever discipline you're in if you had told me knowledge about [00:23:19] in if you had told me knowledge about some other area you know Education civil [00:23:21] some other area you know Education civil engineering biology and law taking deep [00:23:24] engineering biology and law taking deep learning allows you to do very unique [00:23:26] learning allows you to do very unique work apply machine learning to your [00:23:27] work apply machine learning to your domain right let's see um I think that I [00:23:38] domain right let's see um I think that I think well I call the utility but [00:23:40] think well I call the utility but several of you mentioned as well [00:23:41] several of you mentioned as well something that has a positive impact [00:23:42] something that has a positive impact that she helps other people and I don't [00:23:47] that she helps other people and I don't know money could be an aspect of utility [00:23:49] know money could be an aspect of utility but maybe not the most inspiring one and [00:23:52] but maybe not the most inspiring one and then I think um I think one of the [00:23:58] then I think um I think one of the biggest challenges we face in the [00:24:00] biggest challenges we face in the industry today is still frankly is [00:24:02] industry today is still frankly is actually good judgment on feasibility um [00:24:04] actually good judgment on feasibility um so today I still see too many leaders [00:24:09] so today I still see too many leaders sometimes CEOs of large companies that [00:24:11] sometimes CEOs of large companies that stand onstage and announce to the whole [00:24:14] stand onstage and announce to the whole world you know we're gonna do this [00:24:15] world you know we're gonna do this machine learning project to do this by [00:24:17] machine learning project to do this by this deadline and then 20 minutes later [00:24:20] this deadline and then 20 minutes later I talked to their engineers and the NSA [00:24:22] I talked to their engineers and the NSA no there's no way not happening what the [00:24:26] no there's no way not happening what the CEO just final stage how engine [00:24:27] CEO just final stage how engine motivation is not doing it and knows [00:24:29] motivation is not doing it and knows it's impossible so I think one of the [00:24:30] it's impossible so I think one of the biggest challenges is actually [00:24:32] biggest challenges is actually feasibility um in fact I actually know [00:24:34] feasibility um in fact I actually know that a chapter of RT about the TA office [00:24:39] that a chapter of RT about the TA office hours and I know that there been a lot [00:24:42] hours and I know that there been a lot of you know long if you have been [00:24:43] of you know long if you have been thinking about applying end-to-end deep [00:24:45] thinking about applying end-to-end deep learning right you know can you input [00:24:48] learning right you know can you input any X and output any Y and do that [00:24:49] any X and output any Y and do that accurately and sometimes it's possible [00:24:51] accurately and sometimes it's possible and sometimes it's not and it still [00:24:53] and sometimes it's not and it still takes relatively deep judgement about [00:24:56] takes relatively deep judgement about what neural networks can and cannot do [00:24:57] what neural networks can and cannot do with a certain amount of data that you [00:25:00] with a certain amount of data that you may or may not be able to acquire in [00:25:02] may or may not be able to acquire in order to do some of these things right [00:25:04] order to do some of these things right so so I think throughout this quarter [00:25:07] so so I think throughout this quarter you gain much deeper judgment as well on [00:25:10] you gain much deeper judgment as well on what is feasible and I guess no Swedish [00:25:15] what is feasible and I guess no Swedish thing I once no III knew a CEO of a very [00:25:18] thing I once no III knew a CEO of a very of a large company that once told his [00:25:21] of a large company that once told his team he actually gave his team these [00:25:24] team he actually gave his team these instructions he said I watched assume [00:25:26] instructions he said I watched assume that a I can do anything and and and I [00:25:30] that a I can do anything and and and I think that had an interesting effect I [00:25:33] think that had an interesting effect I guess yeah cool all right so I think [00:25:38] guess yeah cool all right so I think step one um was select a project I hope [00:25:42] step one um was select a project I hope there's this thing project I keep some [00:25:43] there's this thing project I keep some of those things in mind um step two is [00:25:46] of those things in mind um step two is get data and so uh what I want you to do [00:25:55] get data and so uh what I want you to do I'm going to pose a second question and [00:25:58] I'm going to pose a second question and then have some of you discuss this let's [00:26:00] then have some of you discuss this let's say that you're actually working on this [00:26:01] say that you're actually working on this you know smart voice-activated embedded [00:26:06] you know smart voice-activated embedded device thing right so let's say that you [00:26:07] device thing right so let's say that you and your friends wonderful startup so [00:26:09] and your friends wonderful startup so train the deep learning algorithm to [00:26:11] train the deep learning algorithm to detect you know phrases like John turn [00:26:14] detect you know phrases like John turn on Mary turn off or Bob turn off or [00:26:16] on Mary turn off or Bob turn off or whatever to sell to device makers so [00:26:19] whatever to sell to device makers so that they can have low voice embedded [00:26:21] that they can have low voice embedded voice detection tripped it doesn't [00:26:22] voice detection tripped it doesn't require a complicated Wi-Fi setup [00:26:25] require a complicated Wi-Fi setup process right so let's see one that [00:26:26] process right so let's see one that let's say that you want to do this so [00:26:28] let's say that you want to do this so you need to collect some data in order [00:26:30] you need to collect some data in order to start training a learning algorithm [00:26:32] to start training a learning algorithm okay so the second question I posed to [00:26:35] okay so the second question I posed to you is to a question in two parts but [00:26:40] you is to a question in two parts but but have you answer it all at the same [00:26:42] but have you answer it all at the same time which is um in how many how many [00:26:44] time which is um in how many how many days let's say you actually proposed [00:26:47] days let's say you actually proposed this for your C su-30 project this [00:26:49] this for your C su-30 project this Sunday and then you start work on it you [00:26:51] Sunday and then you start work on it you know like on Monday or even it's not [00:26:52] know like on Monday or even it's not work on it today before the proposal but [00:26:55] work on it today before the proposal but how many days would you spend collecting [00:26:57] how many days would you spend collecting data and how would you collect the data [00:27:00] data and how would you collect the data okay and I think um actually how many of [00:27:03] okay and I think um actually how many of you have participated in engineering [00:27:05] you have participated in engineering scrum if you know what that means [00:27:07] scrum if you know what that means oh okay a few you they'll see the [00:27:09] oh okay a few you they'll see the industry okay alright so engineering [00:27:11] industry okay alright so engineering estimation when you estimate how long a [00:27:13] estimation when you estimate how long a project takes one of the common [00:27:15] project takes one of the common practices is use a Fibonacci sequence to [00:27:17] practices is use a Fibonacci sequence to estimate how long a project will take [00:27:19] estimate how long a project will take right and so Fibonacci sequence 1 1 2 [00:27:23] right and so Fibonacci sequence 1 1 2 3 5 8 13 and so on and there's roughly [00:27:28] 3 5 8 13 and so on and there's roughly powers of 2 but doesn't grow as fast as [00:27:30] powers of 2 but doesn't grow as fast as powers 2 and Fibonacci numbers are cool [00:27:32] powers 2 and Fibonacci numbers are cool right but so so so what I want you to do [00:27:36] right but so so so what I want you to do a universal special have a configuration [00:27:38] a universal special have a configuration right when I with speech bubbles okay [00:27:53] right when I with speech bubbles okay yeah that's good [00:27:55] yeah that's good all right so what I'd like you to do is [00:27:58] all right so what I'd like you to do is in the text answer I really write two [00:27:59] in the text answer I really write two things one is write a number how many [00:28:02] things one is write a number how many days do you think you spend on [00:28:03] days do you think you spend on collecting data you and your teammates [00:28:05] collecting data you and your teammates if you're actually doing this project [00:28:06] if you're actually doing this project and then how how would you go about [00:28:08] and then how how would you go about collecting the data okay so much take [00:28:12] collecting the data okay so much take like another two minutes to write in an [00:28:18] like another two minutes to write in an answer oh I'm sorry [00:28:28] they're still not activated [00:28:40] sir all that I'm trying to hit oh you [00:28:43] sir all that I'm trying to hit oh you just think that it's not helpful [00:28:53] all right [00:28:55] all right damn it's definitely not helpful all [00:29:04] damn it's definitely not helpful all right let's do this [00:29:05] right let's do this write down your answer on a piece of [00:29:08] write down your answer on a piece of paper first and take two in this design [00:29:10] paper first and take two in this design so the two questions are how many days [00:29:13] so the two questions are how many days pick a number from a Fibonacci sequence [00:29:15] pick a number from a Fibonacci sequence and are you oh okay yeah let's swap out [00:29:20] and are you oh okay yeah let's swap out my computer varieties Oh actually yeah [00:29:22] my computer varieties Oh actually yeah oh [00:29:23] oh if hotties computers working I should go [00:29:25] if hotties computers working I should go ahead oh yeah I can just present yeah [00:29:30] ahead oh yeah I can just present yeah yeah let's plug in your laptop so you [00:29:31] yeah let's plug in your laptop so you just use your laptop yeah doesn't say [00:29:43] just use your laptop yeah doesn't say when there was a network problem or web [00:29:44] when there was a network problem or web browser problem I started using a [00:29:48] browser problem I started using a Firefox recently in addition to Chrome [00:29:50] Firefox recently in addition to Chrome and Safari and Dallas Firefox I tried [00:29:53] and Safari and Dallas Firefox I tried with other web browsers later I cook [00:30:01] with other web browsers later I cook bacon thank you [00:30:02] bacon thank you thanks Artie all right can maybe we have [00:30:10] thanks Artie all right can maybe we have made people that take another minute [00:30:12] made people that take another minute from now just extend the time bit turned [00:31:04] alright another 10 seconds let's see my [00:31:19] alright another 10 seconds let's see my sugar pills answers okay alright well [00:31:25] sugar pills answers okay alright well 365 so there's a there's a there's a lot [00:31:34] 365 so there's a there's a there's a lot of variance in the answers right I don't [00:31:43] of variance in the answers right I don't know [00:31:43] know download from online depends on what [00:31:45] download from online depends on what data you want it turns out well so if [00:31:47] data you want it turns out well so if you're trying to find data or phrases [00:31:48] you're trying to find data or phrases like John turn on then that data doesn't [00:31:51] like John turn on then that data doesn't exist online it turns out we're trying [00:31:53] exist online it turns out we're trying to find audio clips of the web activate [00:31:56] to find audio clips of the web activate there are some websites with single [00:31:58] there are some websites with single words pronounced but those but not a lot [00:32:01] words pronounced but those but not a lot of audio clips actually so they trade [00:32:02] of audio clips actually so they trade the world of the wake word is the word [00:32:03] the world of the wake word is the word activate there are some websites we can [00:32:06] activate there are some websites we can download like maybe 10 audio clips of a [00:32:09] download like maybe 10 audio clips of a few people are saying activate but it's [00:32:11] few people are saying activate but it's quite hard to find hundreds of examples [00:32:13] quite hard to find hundreds of examples of different people saying they were [00:32:14] of different people saying they were activate [00:32:24] five days it falls in the sky all right [00:32:33] five days it falls in the sky all right so let me suggest let me suggest that [00:32:38] so let me suggest let me suggest that you guys discuss with each other in [00:32:40] you guys discuss with each other in small groups what you think would be the [00:32:43] small groups what you think would be the best strategy how many days we find [00:32:44] best strategy how many days we find collecting the data and how would you [00:32:46] collecting the data and how would you start going and try to convince people [00:32:47] start going and try to convince people next to you on that and and before I ask [00:32:51] next to you on that and and before I ask you to start discussing I want to leave [00:32:53] you to start discussing I want to leave you one thought which is how long do you [00:32:55] you one thought which is how long do you think I'll take you to train your first [00:32:57] think I'll take you to train your first model right and so if I take you a day [00:33:00] model right and so if I take you a day to train your first model or two days do [00:33:03] to train your first model or two days do you want to spend X time collecting the [00:33:06] you want to spend X time collecting the eternal and then spend let's say you [00:33:08] eternal and then spend let's say you know I don't know just out of a deep [00:33:09] know I don't know just out of a deep learning thing train them although it [00:33:11] learning thing train them although it might take a couple of days very [00:33:12] might take a couple of days very especially if you download open source [00:33:13] especially if you download open source packages so so if the amount of time [00:33:15] packages so so if the amount of time needed to collect data is X followed by [00:33:18] needed to collect data is X followed by two days to train your first model what [00:33:20] two days to train your first model what do you think X should be the amount of [00:33:22] do you think X should be the amount of time once you guys spend like two [00:33:23] time once you guys spend like two minutes to discuss with each other and [00:33:25] minutes to discuss with each other and see if you can compare their answers are [00:33:27] see if you can compare their answers are there's very large variance right once [00:33:29] there's very large variance right once you guys discuss if you actually if the [00:33:31] you guys discuss if you actually if the people sitting next to you are your [00:33:32] people sitting next to you are your project partners why should discuss with [00:33:34] project partners why should discuss with them how many days you think you should [00:33:35] them how many days you think you should spend collecting danger and how you [00:33:37] spend collecting danger and how you collect your data okay let's take two [00:33:39] collect your data okay let's take two minutes to discuss them [00:35:04] [Applause] [00:35:52] all right guys so Wow all right guy hey [00:36:01] all right guys so Wow all right guy hey guys so all right lot of exciting [00:36:06] guys so all right lot of exciting discussion so actually how many of you [00:36:11] discussion so actually how many of you how many of the groups wound up on the [00:36:13] how many of the groups wound up on the on the low end how many of you you know [00:36:15] on the low end how many of you you know convince each other that maybe it should [00:36:18] convince each other that maybe it should be like three days or less oh just a few [00:36:22] be like three days or less oh just a few of you how come so someone someone say [00:36:26] of you how come so someone someone say why why said wife oh yeah got them open [00:36:43] why why said wife oh yeah got them open the data to test how the other works [00:36:44] the data to test how the other works before you were going to make a massive [00:36:46] before you were going to make a massive you said and then anyone has had a high [00:36:50] you said and then anyone has had a high end like a 13 days or more very few how [00:36:54] end like a 13 days or more very few how come anyone actually anyone anyone with [00:36:56] come anyone actually anyone anyone with insights you want a share of the whole [00:36:57] insights you want a share of the whole classes you [00:36:59] classes you what were you are discussing so excited [00:37:11] knowledge that I actually can take a [00:37:14] knowledge that I actually can take a long time especially for this colony [00:37:15] long time especially for this colony like based on the one I gave you this [00:37:17] like based on the one I gave you this doesn't be a statue if they can make [00:37:19] doesn't be a statue if they can make pretty you say movieclip something like [00:37:21] pretty you say movieclip something like subtitles to make generate sound like [00:37:24] subtitles to make generate sound like audience it means as far as we were just [00:37:26] audience it means as far as we were just wanted to get our system live and that [00:37:29] wanted to get our system live and that would take like a time to like mine yeah [00:37:39] would take like a time to like mine yeah yeah yeah right yeah and then company [00:37:41] yeah yeah right yeah and then company systems to look at subtitle videos right [00:37:44] systems to look at subtitle videos right like YouTube videos with captions or [00:37:47] like YouTube videos with captions or something and if there's appropriately [00:37:49] something and if there's appropriately Creative Commons data there that you [00:37:50] Creative Commons data there that you could use [00:37:51] could use yeah so let me let me tell you my bias I [00:37:54] yeah so let me let me tell you my bias I just tell you what I would do if I was [00:37:56] just tell you what I would do if I was working on this project [00:37:57] working on this project well one caveat if I haven't done so [00:38:00] well one caveat if I haven't done so much work and speech recognition [00:38:01] much work and speech recognition previously fragments was my first [00:38:02] previously fragments was my first project um I would probably spend one to [00:38:06] project um I would probably spend one to two days collecting today's Hera kind of [00:38:08] two days collecting today's Hera kind of on the short end right and I think that [00:38:11] on the short end right and I think that Dallman you know one of the and and and [00:38:14] Dallman you know one of the and and and one of the reasons is that machine [00:38:17] one of the reasons is that machine learning kind of that circle I grew up [00:38:18] learning kind of that circle I grew up there is actually a very iterative [00:38:21] there is actually a very iterative process where um until you try it you [00:38:25] process where um until you try it you you almost never know what's actually [00:38:27] you almost never know what's actually going to be hard about the problem right [00:38:30] going to be hard about the problem right and so so if I was seeing this project I [00:38:33] and so so if I was seeing this project I just so you wanna see what I would do [00:38:34] just so you wanna see what I would do okay now that you thought about this [00:38:35] okay now that you thought about this project a bunch right including you know [00:38:37] project a bunch right including you know trying to validate market acceptance and [00:38:39] trying to validate market acceptance and so on but um which is that I would get a [00:38:42] so on but um which is that I would get a cheap microphone user or use the [00:38:45] cheap microphone user or use the built-in laptop microphone or buy a [00:38:46] built-in laptop microphone or buy a microphone off you know buy a microphone [00:38:48] microphone off you know buy a microphone off Amazon or something and go around [00:38:51] off Amazon or something and go around say go around Stanford campus or go to [00:38:53] say go around Stanford campus or go to your friends and have them just say hey [00:38:55] your friends and have them just say hey do you mind saying into this microphone [00:38:57] do you mind saying into this microphone the word activate or don't turn on or [00:38:59] the word activate or don't turn on or whatever and collect a bunch of data [00:39:01] whatever and collect a bunch of data that way and then and with one or two [00:39:07] that way and then and with one or two days you should be able to collect at [00:39:10] days you should be able to collect at least hundreds of examples and that [00:39:14] least hundreds of examples and that might be enough of a data set to start [00:39:16] might be enough of a data set to start training a rudimentary learning [00:39:17] training a rudimentary learning algorithm to get going [00:39:19] algorithm to get going because if you have not yet worked on [00:39:21] because if you have not yet worked on this problem before it turns out to be [00:39:23] this problem before it turns out to be very difficult to know what's going to [00:39:25] very difficult to know what's going to be hard about the problem so it's what's [00:39:27] be hard about the problem so it's what's going to be hard [00:39:28] going to be hard highly accented speakers right oh it [00:39:31] highly accented speakers right oh it smells gonna be hard [00:39:32] smells gonna be hard background noise what's gonna be hard [00:39:35] background noise what's gonna be hard you know confusing turn on with turn off [00:39:37] you know confusing turn on with turn off you hear John turn and then but when you [00:39:41] you hear John turn and then but when you build a new machine learning system it's [00:39:44] build a new machine learning system it's very difficult to know what's hard it [00:39:46] very difficult to know what's hard it was easy about the problem [00:39:48] was easy about the problem what's going to be difficult that Fafi [00:39:51] what's going to be difficult that Fafi which is the technical term for if the [00:39:53] which is the technical term for if the microphone is very far away [00:39:55] microphone is very far away all right so turns out that you know if [00:39:57] all right so turns out that you know if we turn on the microphone on my laptop [00:39:59] we turn on the microphone on my laptop now for example the laptop which is like [00:40:03] now for example the laptop which is like three meters away from me will be [00:40:05] three meters away from me will be hearing voice directly from my null as [00:40:08] hearing voice directly from my null as well as voice bouncing off the wall so [00:40:10] well as voice bouncing off the wall so there's a lot of reverberation in this [00:40:11] there's a lot of reverberation in this room and so that makes speech [00:40:12] room and so that makes speech recognition harder humor here are so [00:40:14] recognition harder humor here are so good at processing out reverberant [00:40:16] good at processing out reverberant sounds reverberations that you almost [00:40:18] sounds reverberations that you almost don't notice it but it makes it actually [00:40:20] don't notice it but it makes it actually but but learning every room will have [00:40:22] but but learning every room will have sometimes has problems with [00:40:24] sometimes has problems with reverberations ray or echoes bouncing [00:40:26] reverberations ray or echoes bouncing off the hard walls in this room and so [00:40:29] off the hard walls in this room and so depending on what you're learning album [00:40:31] depending on what you're learning album has trouble with you will then want to [00:40:34] has trouble with you will then want to go back to collect very different types [00:40:36] go back to collect very different types of data or explore very different types [00:40:37] of data or explore very different types of algorithms well that's the problem [00:40:38] of algorithms well that's the problem that sometimes it's because just the [00:40:40] that sometimes it's because just the volume is just too soft in which case [00:40:42] volume is just too soft in which case you know maybe you need to do something [00:40:44] you know maybe you need to do something else and normalize all the volumes or [00:40:45] else and normalize all the volumes or buy more since it might for something so [00:40:47] buy more since it might for something so it turns out that when building most [00:40:48] it turns out that when building most machine learning applications unless [00:40:50] machine learning applications unless you've experienced working on it so I've [00:40:53] you've experienced working on it so I've actually worked on this problem before [00:40:54] actually worked on this problem before so have a sense it was hard it was easy [00:40:56] so have a sense it was hard it was easy but they work on the new project for the [00:40:57] but they work on the new project for the first time it's very difficult to know [00:40:59] first time it's very difficult to know what's on it was easy and so my advice [00:41:00] what's on it was easy and so my advice to most teams is rather than spending [00:41:04] to most teams is rather than spending say 20 days to collect data and then two [00:41:10] say 20 days to collect data and then two days to collect model to train a model [00:41:13] days to collect model to train a model and it's often by training a model and [00:41:15] and it's often by training a model and then seeing what are the examples it [00:41:18] then seeing what are the examples it gets wrong whether the averin fail that [00:41:20] gets wrong whether the averin fail that that lets you feedback to either collect [00:41:23] that lets you feedback to either collect more data or redesign the model [00:41:27] more data or redesign the model right or try something else and if you [00:41:30] right or try something else and if you can string the data collection peering [00:41:32] can string the data collection peering down to be more comparable to how long [00:41:35] down to be more comparable to how long you end up taking to train your model [00:41:37] you end up taking to train your model then you can start iterating much more [00:41:39] then you can start iterating much more rapidly on actually improving your model [00:41:42] rapidly on actually improving your model right oh and um so maybe one roof that I [00:41:46] right oh and um so maybe one roof that I should tend to recommend for most cost [00:41:47] should tend to recommend for most cost projects is I don't know if it may be if [00:41:50] projects is I don't know if it may be if you need to spend a week up to a week to [00:41:51] you need to spend a week up to a week to collect data you know maybe that's okay [00:41:53] collect data you know maybe that's okay but you can get going even more quickly [00:41:55] but you can get going even more quickly I would even maybe more strongly [00:41:58] I would even maybe more strongly recommend that and there been so few [00:42:03] recommend that and there been so few examples in my life where the first time [00:42:05] examples in my life where the first time I trained the learning algorithm it [00:42:06] I trained the learning algorithm it works right it like pretty much never [00:42:08] works right it like pretty much never happens yeah it happened once about a [00:42:11] happens yeah it happened once about a year ago and I was so surprised I still [00:42:13] year ago and I was so surprised I still remember that one time and so what so so [00:42:18] remember that one time and so what so so machine learning development is often a [00:42:20] machine learning development is often a very iterative process and by quickly [00:42:22] very iterative process and by quickly find a set and and often datasets are [00:42:25] find a set and and often datasets are collected through sweat and hard work [00:42:26] collected through sweat and hard work right and so I would literally you know [00:42:29] right and so I would literally you know and actually what my version along well [00:42:31] and actually what my version along well speech indicate going quickly I would [00:42:33] speech indicate going quickly I would probably just have myself well my team [00:42:36] probably just have myself well my team members run around and find people and [00:42:39] members run around and find people and ask them to speak into microphone and [00:42:40] ask them to speak into microphone and record all your clips that way and then [00:42:43] record all your clips that way and then only when you validate that you need a [00:42:44] only when you validate that you need a bigger data set which you go to more [00:42:46] bigger data set which you go to more complicated things I said of an Amazon [00:42:48] complicated things I said of an Amazon Mechanical Turk thing right to [00:42:49] Mechanical Turk thing right to crowdsource which I've also done [00:42:51] crowdsource which I've also done actually I've also had very largely a [00:42:53] actually I've also had very largely a self collected of Amazon Mechanical Turk [00:42:55] self collected of Amazon Mechanical Turk but only in a later stage the project [00:42:57] but only in a later stage the project and you understand what you really need [00:42:59] and you understand what you really need um so as you as you start work on your [00:43:05] um so as you as you start work on your class projects maybe maybe keep that [00:43:07] class projects maybe maybe keep that keep that in mind now [00:43:27] so one other tip that machine learning [00:43:31] so one other tip that machine learning researchers on average we tend to be [00:43:33] researchers on average we tend to be terrible at this but I'll give this [00:43:36] terrible at this but I'll give this advice anyway is when you're going [00:43:37] advice anyway is when you're going through this process yes someday there [00:43:39] through this process yes someday there are design a model a literature search [00:43:41] are design a model a literature search would be very helpful you know so see [00:43:43] would be very helpful you know so see what other see what algorithms others [00:43:44] what other see what algorithms others are using for this problem it turns out [00:43:46] are using for this problem it turns out the literature actually quite immature [00:43:47] the literature actually quite immature there isn't a convergence of like a well [00:43:50] there isn't a convergence of like a well standard set of standard algorithms for [00:43:52] standard set of standard algorithms for trigger word detection in literature [00:43:54] trigger word detection in literature right now much people are still making [00:43:56] right now much people are still making up algorithms so if you if you do the [00:43:57] up algorithms so if you if you do the survey you find that to be case but you [00:43:59] survey you find that to be case but you need trained initial model and in most [00:44:02] need trained initial model and in most machine learning applications you go [00:44:03] machine learning applications you go through this process multiple time so [00:44:05] through this process multiple time so one tip that I would recommend you do is [00:44:08] one tip that I would recommend you do is uh keep clear notes on the experiments [00:44:15] uh keep clear notes on the experiments you've run because so often be as we [00:44:21] you've run because so often be as we train them although you see oh this [00:44:22] train them although you see oh this model works great on American accent two [00:44:24] model works great on American accent two speakers but not on British accent two [00:44:26] speakers but not on British accent two speakers right I was born in the UK so [00:44:29] speakers right I was born in the UK so I'm just use Pradesh accents run example [00:44:30] I'm just use Pradesh accents run example if you're a different part of the world [00:44:31] if you're a different part of the world you think of different global axes it [00:44:34] you think of different global axes it sends down from the UK I'm just a pick [00:44:36] sends down from the UK I'm just a pick on British accents I guess keep clear [00:44:38] on British accents I guess keep clear notes on the experiments run because [00:44:40] notes on the experiments run because what happens in every machine learning [00:44:42] what happens in every machine learning project is after a while you have [00:44:43] project is after a while you have trained 30 models and then you and your [00:44:45] trained 30 models and then you and your team is occurring oh yeah we tried that [00:44:47] team is occurring oh yeah we tried that idea two weeks ago didn't work and if [00:44:49] idea two weeks ago didn't work and if you have clear notes from when you [00:44:51] you have clear notes from when you actually did that work two years ago [00:44:52] actually did that work two years ago then you can refer back rather than have [00:44:54] then you can refer back rather than have to rerun an experiment [00:44:55] to rerun an experiment oh the other thing that some groups do [00:44:57] oh the other thing that some groups do is have a spreadsheet that keeps track [00:45:01] is have a spreadsheet that keeps track of what's the learning rate you use [00:45:03] of what's the learning rate you use what's the number of hidden units what's [00:45:04] what's the number of hidden units what's this was this was this or cheaper than a [00:45:07] this was this was this or cheaper than a in a text document so that which will [00:45:09] in a text document so that which will make it easier to refer to it to know [00:45:12] make it easier to refer to it to know some ways you try earlier this is one [00:45:15] some ways you try earlier this is one piece of comedy giving advice and [00:45:17] piece of comedy giving advice and there's one of those things that every [00:45:19] there's one of those things that every machine learning person knows we should [00:45:20] machine learning person knows we should do this but on average we're very bad at [00:45:22] do this but on average we're very bad at doing [00:45:23] doing but but that you could I don't know but [00:45:27] but but that you could I don't know but at the times I manage to keep good knows [00:45:29] at the times I manage to keep good knows is that she save them all the time right [00:45:31] is that she save them all the time right to try to remember what exactly you [00:45:32] to try to remember what exactly you tried two weeks ago okay so um Baba this [00:45:37] tried two weeks ago okay so um Baba this class will be on this process how to get [00:45:40] class will be on this process how to get data develop the chain dev test the [00:45:42] data develop the chain dev test the design and model we train the model [00:45:43] design and model we train the model eventually test them all then innovate [00:45:45] eventually test them all then innovate so a lot of us causes on this so when [00:45:48] so a lot of us causes on this so when they jump ahead to when you have a good [00:45:49] they jump ahead to when you have a good enough model and you want to deploy it [00:45:51] enough model and you want to deploy it okay so step six [00:45:54] okay so step six my guess is deployment now um this is uh [00:46:04] my guess is deployment now um this is uh one of the reasons I want to step [00:46:06] one of the reasons I want to step through this example going through a [00:46:08] through this example going through a concrete example is I find it when [00:46:10] concrete example is I find it when you're learning about machine learning [00:46:11] you're learning about machine learning for the first time it's often seeing you [00:46:13] for the first time it's often seeing you know what my team's tend to call wall [00:46:15] know what my team's tend to call wall stories kind of stories of projects that [00:46:17] stories kind of stories of projects that others have built before that often [00:46:20] others have built before that often provides the best learning experience so [00:46:21] provides the best learning experience so I think like I have built speech [00:46:23] I think like I have built speech recognition systems it took me like a [00:46:25] recognition systems it took me like a year or two years ago me to do it so I'm [00:46:27] year or two years ago me to do it so I'm trying to so rather than you know having [00:46:30] trying to so rather than you know having you spend two years here alight building [00:46:32] you spend two years here alight building speech systems if can summarize a war [00:46:34] speech systems if can summarize a war story right to tell you what the process [00:46:36] story right to tell you what the process is like I'm hoping that these concrete [00:46:38] is like I'm hoping that these concrete examples of what building these systems [00:46:39] examples of what building these systems are like in you know large corporations [00:46:41] are like in you know large corporations that that can help you accelerate your [00:46:43] that that can help you accelerate your learnings without needing to get two [00:46:45] learnings without needing to get two years of on-the-job experience you can [00:46:47] years of on-the-job experience you can just hear the salient points okay now if [00:46:50] just hear the salient points okay now if you're deploying a system like this one [00:46:52] you're deploying a system like this one of the things intersection true there's [00:46:55] of the things intersection true there's actually real phenomenon for deploying [00:46:57] actually real phenomenon for deploying speech systems is uh yet the audio clip [00:46:59] speech systems is uh yet the audio clip you have a new network and then you know [00:47:04] you have a new network and then you know this will output zero one and the neural [00:47:07] this will output zero one and the neural networks that work well will tend to be [00:47:10] networks that work well will tend to be relatively large relatively large mall [00:47:12] relatively large relatively large mall the Marshall or his engine is relatively [00:47:14] the Marshall or his engine is relatively high complexity and if you have so the [00:47:17] high complexity and if you have so the smart speakers in your home you [00:47:20] smart speakers in your home you recognize that a lot of them are age [00:47:23] recognize that a lot of them are age devices as opposed to purely crawl [00:47:26] devices as opposed to purely crawl computation right so we all know what [00:47:28] computation right so we all know what the cloud is and what an edge devices an [00:47:32] the cloud is and what an edge devices an edge device is a small speaker that's in [00:47:34] edge device is a small speaker that's in your home or the cell phone in your [00:47:36] your home or the cell phone in your so edge devices are you know the things [00:47:39] so edge devices are you know the things that are close to the data is supposed [00:47:41] that are close to the data is supposed to cloud which is a giant service we [00:47:43] to cloud which is a giant service we have in our data centers right so um [00:47:45] have in our data centers right so um because of network latency and and and [00:47:50] because of network latency and and and and because of privacy a lot of these [00:47:53] and because of privacy a lot of these computations are done on edge devices [00:47:54] computations are done on edge devices like the small speaker in your home or [00:47:57] like the small speaker in your home or like I guess hey Suri or okay Google can [00:48:01] like I guess hey Suri or okay Google can wake up your cell phone right and so [00:48:04] wake up your cell phone right and so edge devices have much lower [00:48:06] edge devices have much lower computational budgets and much lower [00:48:07] computational budgets and much lower power budgets limited battery life much [00:48:10] power budgets limited battery life much less powerful processors than we have in [00:48:11] less powerful processors than we have in our cloud data centers and so it turns [00:48:15] our cloud data centers and so it turns out that salt salt serving up a very [00:48:17] out that salt salt serving up a very large neural network is quite difficult [00:48:20] large neural network is quite difficult right it's very difficult for you know a [00:48:22] right it's very difficult for you know a low-power inexpensive microprocessor [00:48:25] low-power inexpensive microprocessor sitting in the spots between your living [00:48:27] sitting in the spots between your living room to run a very large neural network [00:48:30] room to run a very large neural network with a lot of hidden units with all the [00:48:31] with a lot of hidden units with all the parameters and so what is often done is [00:48:38] parameters and so what is often done is to actually do this which is to input an [00:48:51] to actually do this which is to input an audio clip and then have a much simpler [00:48:55] audio clip and then have a much simpler algorithm figure out if you know anyone [00:48:58] algorithm figure out if you know anyone is even talking right because so the [00:49:01] is even talking right because so the smart speaker you know in my living room [00:49:03] smart speaker you know in my living room here silence most of the day right [00:49:05] here silence most of the day right because usually just no one at home [00:49:06] because usually just no one at home writes no no no voice and then only if [00:49:09] writes no no no voice and then only if it hears you know someone talking then [00:49:12] it hears you know someone talking then feeding to the big neural network that [00:49:16] feeding to the big neural network that you've trained and ramped up use a [00:49:17] you've trained and ramped up use a larger power budget in order to classify [00:49:22] larger power budget in order to classify 0 1 ok this component goes by many [00:49:26] 0 1 ok this component goes by many different names in in reasonably [00:49:29] different names in in reasonably standard terminology but not totally [00:49:31] standard terminology but not totally standard terminology in the literature [00:49:33] standard terminology in the literature I'm gonna call this VAD for a voice [00:49:37] I'm gonna call this VAD for a voice activity detection it turns out that [00:49:43] activity detection it turns out that voice activity detection is the standard [00:49:45] voice activity detection is the standard component is in many different speech [00:49:46] component is in many different speech recognition system [00:49:47] recognition system if you are using a cell phone for [00:49:50] if you are using a cell phone for example VAD is a component that tries to [00:49:53] example VAD is a component that tries to figure there is even talking because if [00:49:55] figure there is even talking because if it thinks no one is talking then there's [00:49:56] it thinks no one is talking then there's no need to encode the audio and try to [00:49:58] no need to encode the audio and try to transmit the audio right yeah LT could [00:50:01] transmit the audio right yeah LT could you know yeah um and so so the next [00:50:09] you know yeah um and so so the next question I want to ask you and then I I [00:50:13] question I want to ask you and then I I thought this is timely because well is a [00:50:20] thought this is timely because well is a couple options right option one is to [00:50:25] couple options right option one is to build an on machine learning based EAD [00:50:27] build an on machine learning based EAD system voice activity detection system [00:50:29] system voice activity detection system which is just you know see if the volume [00:50:36] which is just you know see if the volume of the audio your spawn speakers [00:50:39] of the audio your spawn speakers recording is greater than epsilon so the [00:50:41] recording is greater than epsilon so the silence just together and option two is [00:50:46] silence just together and option two is train a small neural network to [00:50:53] train a small neural network to recognize on on on human speech right [00:50:59] recognize on on on human speech right and so my next question to you is if you [00:51:07] and so my next question to you is if you work on this project which you pick [00:51:09] work on this project which you pick option one or would you pick option to [00:51:12] option one or would you pick option to write as you as you as you work to what [00:51:15] write as you as you as you work to what oh sorry [00:51:16] oh sorry and I think on a small neural network so [00:51:19] and I think on a small neural network so to a small neural network or in some [00:51:21] to a small neural network or in some cases I've seen people use a small [00:51:23] cases I've seen people use a small support vector machine as well for those [00:51:24] support vector machine as well for those you know what that is a small model can [00:51:27] you know what that is a small model can be run with a low computational budget [00:51:28] be run with a low computational budget it's a much simpler problem to Detective [00:51:30] it's a much simpler problem to Detective someone is talking than to recognize the [00:51:32] someone is talking than to recognize the word this is so you can actually do this [00:51:33] word this is so you can actually do this you know what reasonable accuracy was [00:51:36] you know what reasonable accuracy was small new network but if you actually [00:51:38] small new network but if you actually work on this project for CS 230 which [00:51:41] work on this project for CS 230 which would you try for us so could we come to [00:51:44] would you try for us so could we come to the next question [00:51:48] yeah yeah you can let them start on [00:51:51] yeah yeah you can let them start on screen I guess and I mean why are you [00:51:52] screen I guess and I mean why are you afraid other projection cool [00:51:57] afraid other projection cool I'll just keep unlocking it periodically [00:52:02] are people able to vote no they're no [00:52:06] are people able to vote no they're no pants yeah well I see I guess you write [00:52:19] pants yeah well I see I guess you write so much code you have a shortcut to go [00:52:21] so much code you have a shortcut to go through your coding environment oh [00:52:22] through your coding environment oh you're that all right cool oh well great [00:52:49] you're that all right cool oh well great right people have seen quickly another [00:52:53] right people have seen quickly another like 20 seconds if that's enough time to [00:52:55] like 20 seconds if that's enough time to get your answers [00:53:20] all right cool [00:53:28] that's fascinating [00:53:30] that's fascinating there's a lot of disagreement in this [00:53:31] there's a lot of disagreement in this house people will not say why why would [00:53:35] house people will not say why why would you choose option 1 why would you choose [00:53:37] you choose option 1 why would you choose option 2 and then IIIi have a very [00:53:40] option 2 and then IIIi have a very strong point of view on what I would do [00:53:41] strong point of view on what I would do right but but I'm curious why why option [00:53:45] right but but I'm curious why why option 1 and why option to go ahead either [00:54:02] [Music] [00:54:15] option 2 you can probably kind of [00:54:19] option 2 you can probably kind of already like this simplify the problem [00:54:22] already like this simplify the problem when a consultant know exists not if it [00:54:25] when a consultant know exists not if it knows I mean activates the machine but [00:54:30] if it's out parking option to be much [00:54:33] if it's out parking option to be much better yes option 2 [00:54:48] 300 when someone's whistling oh yeah [00:55:14] 300 when someone's whistling oh yeah right if you're in the noisy place like [00:55:16] right if you're in the noisy place like you know I have a friend who saw the [00:55:18] you know I have a friend who saw the statue this next a train station and so [00:55:20] statue this next a train station and so right so option one we picked up a lot [00:55:22] right so option one we picked up a lot the train which ever it has to be [00:55:26] the train which ever it has to be running constantly so you want something [00:55:28] running constantly so you want something in Spanish so it seems like option one [00:55:30] in Spanish so it seems like option one is better because yeah whether it has to [00:55:36] is better because yeah whether it has to become running constantly you still want [00:55:37] become running constantly you still want to be like no power no country so let me [00:55:40] to be like no power no country so let me show you for the pros and cons um so um [00:55:45] show you for the pros and cons um so um I think you know there are pros and cons [00:55:48] I think you know there are pros and cons option when the option to versus while [00:55:49] option when the option to versus while you're sewing so so many votes for both [00:55:51] you're sewing so so many votes for both options I perceive would choose option 1 [00:55:55] options I perceive would choose option 1 but but let me just let's just discuss [00:55:57] but but let me just let's just discuss the pros and cons right I think that um [00:56:00] the pros and cons right I think that um option 1 um first is just a few lines of [00:56:03] option 1 um first is just a few lines of code this is yes maybe option 2 isn't [00:56:06] code this is yes maybe option 2 isn't that complicated but option 1 is even [00:56:07] that complicated but option 1 is even simpler and I think that um actually [00:56:12] simpler and I think that um actually maybe I would say if I hadn't worked on [00:56:14] maybe I would say if I hadn't worked on this problem before I which is option 1 [00:56:16] this problem before I which is option 1 but since I have experience as feature a [00:56:18] but since I have experience as feature a commission eventually I know you need [00:56:20] commission eventually I know you need option 2 but that's because I because [00:56:22] option 2 but that's because I because I've worked on this problem before but [00:56:24] I've worked on this problem before but if your first time work on the speech [00:56:25] if your first time work on the speech application problem I would encourage [00:56:27] application problem I would encourage you on average to try to really simple [00:56:30] you on average to try to really simple quick and dirty solutions and go ahead [00:56:32] quick and dirty solutions and go ahead and so let's see how long would it take [00:56:35] and so let's see how long would it take you to implement this right I would say [00:56:37] you to implement this right I would say like 10 minutes five minutes I don't [00:56:39] like 10 minutes five minutes I don't know right Harlan would think of them [00:56:42] know right Harlan would think of them with that oh four hours one day I I [00:56:46] with that oh four hours one day I I don't really know actually right now let [00:56:49] don't really know actually right now let me just write one day and I'm not quite [00:56:51] me just write one day and I'm not quite sure all right [00:56:52] sure all right but if um option one commence in 10 [00:56:55] but if um option one commence in 10 minutes then I would encourage you to do [00:56:58] minutes then I would encourage you to do that and go ahead and put the [00:57:00] that and go ahead and put the smart speaker in your home or in your [00:57:02] smart speaker in your home or in your potential users homes and only when you [00:57:05] potential users homes and only when you find out that the dog barking is a [00:57:08] find out that the dog barking is a problem or the train on the railway [00:57:10] problem or the train on the railway sings you know whatever it's a problem [00:57:11] sings you know whatever it's a problem then go back and invest more in fixing [00:57:14] then go back and invest more in fixing it right and in fact um it's true that [00:57:17] it right and in fact um it's true that maybe it's annoying that the dog barking [00:57:19] maybe it's annoying that the dog barking keeps on waking up the system [00:57:20] keeps on waking up the system but maybe that's okay because if the [00:57:22] but maybe that's okay because if the large new network then screens out all [00:57:24] large new network then screens out all the dog barking then the overall [00:57:26] the dog barking then the overall performance system is actually just fine [00:57:28] performance system is actually just fine and and and then you now have a much [00:57:30] and and and then you now have a much simpler system rights but but but it [00:57:34] simpler system rights but but but it turns out that um the reason you might [00:57:37] turns out that um the reason you might need to go to option to eventually is [00:57:39] need to go to option to eventually is because there are some homes in noisy [00:57:41] because there are some homes in noisy environments [00:57:42] environments you know this constant background noise [00:57:44] you know this constant background noise and so that will keep the large new [00:57:46] and so that will keep the large new network running longer too frequently so [00:57:48] network running longer too frequently so so if you have a large engineering [00:57:50] so if you have a large engineering budget you know so it's not the small [00:57:52] budget you know so it's not the small speaker teams are hundreds of engineers [00:57:54] speaker teams are hundreds of engineers working on it they have hundreds of [00:57:55] working on it they have hundreds of engines work on that totally options who [00:57:58] engines work on that totally options who will perform better but if you're [00:58:00] will perform better but if you're strapped a start-up team is scrappy [00:58:02] strapped a start-up team is scrappy startup team with three of you work on a [00:58:04] startup team with three of you work on a cross project you know the evidence that [00:58:07] cross project you know the evidence that you need that love of complexity is not [00:58:10] you need that love of complexity is not that high and I would really do that [00:58:12] that high and I would really do that first and and use that to gather [00:58:14] first and and use that to gather evidence that you really should make the [00:58:16] evidence that you really should make the investment to build more complex system [00:58:18] investment to build more complex system before actually making the investments [00:58:19] before actually making the investments of days or and eventually I think this [00:58:23] of days or and eventually I think this is one day to put your first prototype [00:58:24] is one day to put your first prototype right and then eventually will be will [00:58:26] right and then eventually will be will be more complicated um it turns out that [00:58:30] be more complicated um it turns out that the other reason the other huge [00:58:36] the other reason the other huge advantage of the simple method is the [00:58:38] advantage of the simple method is the following oh and this is one of the [00:58:43] following oh and this is one of the frankly this is one of the this is [00:58:45] frankly this is one of the this is actually one of the big problems and big [00:58:47] actually one of the big problems and big weaknesses of machine learning [00:58:48] weaknesses of machine learning algorithms and deep learning of rooms [00:58:49] algorithms and deep learning of rooms which is what happens is uh when you [00:58:54] which is what happens is uh when you build a system and you should ship a [00:58:55] build a system and you should ship a product the data will change right and [00:58:58] product the data will change right and so I'm gonna sin Phi the example of it [00:59:00] so I'm gonna sin Phi the example of it but you know I know Stanford is very [00:59:02] but you know I know Stanford is very cosmopolitan [00:59:03] cosmopolitan this powell's is very hot so on see the [00:59:05] this powell's is very hot so on see the collect data in this region you get [00:59:07] collect data in this region you get access from people all over the world [00:59:09] access from people all over the world right because because that's Stanford [00:59:10] right because because that's Stanford all that's [00:59:11] all that's although but but just to simplify these [00:59:13] although but but just to simplify these app a little bit let's say that you [00:59:15] app a little bit let's say that you train on u.s. accents right but you know [00:59:24] train on u.s. accents right but you know for some reason when you ship a product [00:59:27] for some reason when you ship a product maybe it sells really well in the UK and [00:59:30] maybe it sells really well in the UK and you start getting data with UK or with [00:59:38] you start getting data with UK or with British accents so one of the biggest [00:59:44] British accents so one of the biggest problems you face in practical [00:59:46] problems you face in practical deployment of machine learning systems [00:59:48] deployment of machine learning systems is that the data you train on is not [00:59:51] is that the data you train on is not going to be the data you need to perform [00:59:52] going to be the data you need to perform well on and and I'm going to share with [00:59:56] well on and and I'm going to share with you some practical ideas for how to [00:59:57] you some practical ideas for how to solve this but this is one of those [00:59:59] solve this but this is one of those practical realities and practical [01:00:01] practical realities and practical reasons is machine learning that is [01:00:03] reasons is machine learning that is actually not talked about much in [01:00:05] actually not talked about much in academia because it turns out that the [01:00:08] academia because it turns out that the data says we have in academia are not [01:00:10] data says we have in academia are not selling well for researchers to study [01:00:13] selling well for researchers to study and publish papers on this I think we [01:00:14] and publish papers on this I think we can sell new machine learning benchmarks [01:00:16] can sell new machine learning benchmarks in the future but there's one of those [01:00:17] in the future but there's one of those problems that is actually kind of [01:00:19] problems that is actually kind of underappreciated in academic literature [01:00:20] underappreciated in academic literature but that is a problem facing many many [01:00:24] but that is a problem facing many many practical deployments machine learning [01:00:26] practical deployments machine learning algorithms and and so more generally eat [01:00:32] algorithms and and so more generally eat the problem is one of data changing [01:00:34] the problem is one of data changing right and you might have new classes of [01:00:37] right and you might have new classes of users with new accents or you might [01:00:42] users with new accents or you might train a lot on the maybe you get data [01:00:45] train a lot on the maybe you get data from even Stanford users and maybe [01:00:48] from even Stanford users and maybe Stanford is not too noisy or Stanford at [01:00:50] Stanford is not too noisy or Stanford at certain you know types of characters [01:00:51] certain you know types of characters things when you ship it to another city [01:00:53] things when you ship it to another city another country there's much more noisy [01:00:55] another country there's much more noisy you know different background noise [01:01:02] right or you start manufacturing the [01:01:05] right or you start manufacturing the small speaker and to lower the cost of [01:01:07] small speaker and to lower the cost of the speaker they swap it out they swap [01:01:10] the speaker they swap it out they swap out the high-end microphone that you use [01:01:13] out the high-end microphone that you use from your laptop to collect the data [01:01:14] from your laptop to collect the data from low-end microphone this very common [01:01:19] from low-end microphone this very common thing done in you know well done the [01:01:21] thing done in you know well done the manufacturing right if you can use a [01:01:22] manufacturing right if you can use a cheaper microphone wine [01:01:24] cheaper microphone wine and often to human ears the sound sounds [01:01:27] and often to human ears the sound sounds just fine on a cheaper microphone but if [01:01:29] just fine on a cheaper microphone but if you change your learning algorithm using [01:01:31] you change your learning algorithm using your you know I guess yeah well I use a [01:01:33] your you know I guess yeah well I use a map but the Mac has a pretty decent [01:01:35] map but the Mac has a pretty decent microphone so if you train the data [01:01:36] microphone so if you train the data using all your craft or a Mac and then [01:01:38] using all your craft or a Mac and then eventually is a different microphone it [01:01:40] eventually is a different microphone it may not generalize well so one of the [01:01:44] may not generalize well so one of the challenges of machine learning is that [01:01:47] challenges of machine learning is that you often develop a system on one [01:01:50] you often develop a system on one dataset and then when you ship a product [01:01:51] dataset and then when you ship a product something about the world changes and [01:01:54] something about the world changes and your system needs to perform on a very [01:01:57] your system needs to perform on a very different type of data than what you had [01:01:59] different type of data than what you had trained up and so and so what would [01:02:11] trained up and so and so what would happen is after you deploy the model the [01:02:16] happen is after you deploy the model the world may change and you often end up [01:02:18] world may change and you often end up going back to get more data redesign the [01:02:21] going back to get more data redesign the model right and I guess sorry and this [01:02:23] model right and I guess sorry and this is this is a the maintenance of the [01:02:26] is this is a the maintenance of the machine learning model only give some of [01:02:28] machine learning model only give some of the examples web search right this [01:02:33] the examples web search right this happens all the time at multiple search [01:02:34] happens all the time at multiple search engines which is you train a neural [01:02:37] engines which is you train a neural network or you train a system to give [01:02:39] network or you train a system to give relevant web search results but then [01:02:41] relevant web search results but then something about the world changes your [01:02:43] something about the world changes your for example there's a major public web [01:02:44] for example there's a major public web and some new person is elected president [01:02:46] and some new person is elected president of some foreign country or there's a [01:02:48] of some foreign country or there's a major scandal or just the internet [01:02:51] major scandal or just the internet changes right or there's a actually what [01:02:54] changes right or there's a actually what happens in China is a new words getting [01:02:55] happens in China is a new words getting invented all the time in China China [01:02:58] invented all the time in China China says that by of were that Google and [01:03:00] says that by of were that Google and Baidu but the Chinese language is more [01:03:02] Baidu but the Chinese language is more fluid than the English language and so [01:03:04] fluid than the English language and so new words get invented all the time and [01:03:06] new words get invented all the time and so the language changes and so whatever [01:03:08] so the language changes and so whatever your train just isn't working as long as [01:03:10] your train just isn't working as long as it used to right or maybe a different [01:03:15] it used to right or maybe a different company suddenly shuts off you know [01:03:17] company suddenly shuts off you know their entire website to your search [01:03:19] their entire website to your search index because they don't want you [01:03:20] index because they don't want you indexing their website and so the [01:03:22] indexing their website and so the internet changes and what have you had [01:03:24] internet changes and what have you had done doesn't work anymore or [01:03:33] it turns out if you build a self-driving [01:03:34] it turns out if you build a self-driving car in California and then you try to [01:03:36] car in California and then you try to deploy these vehicles in Texas you know [01:03:40] deploy these vehicles in Texas you know it turns out traffic lights in Texas [01:03:42] it turns out traffic lights in Texas look very different than traffic lights [01:03:43] look very different than traffic lights in California so um although it rained [01:03:46] in California so um although it rained on California Texas so a new network [01:03:50] on California Texas so a new network trained to recognize California traffic [01:03:52] trained to recognize California traffic lights actually doesn't work very well [01:03:54] lights actually doesn't work very well on Texas traffic lights right I'm trying [01:03:57] on Texas traffic lights right I'm trying to remember which way round leanest I [01:03:58] to remember which way round leanest I think California Texas has a different [01:04:01] think California Texas has a different distribution of horizontal versus [01:04:02] distribution of horizontal versus vertical traffic lights for example [01:04:04] vertical traffic lights for example right it's actually humans don't else's [01:04:06] right it's actually humans don't else's you go oh yeah red yellow green but the [01:04:08] you go oh yeah red yellow green but the learning algorithm doesn't actually [01:04:09] learning algorithm doesn't actually generalize that well if you go to [01:04:10] generalize that well if you go to different locations go to a foreign [01:04:12] different locations go to a foreign country again traffic light signage the [01:04:14] country again traffic light signage the lane markers all change or I guess what [01:04:19] lane markers all change or I guess what one example is working on earlier this [01:04:21] one example is working on earlier this week right manufacturing right landing a [01:04:24] week right manufacturing right landing a guy working on inspection of parts and [01:04:27] guy working on inspection of parts and factories and so if you are doing visual [01:04:31] factories and so if you are doing visual inspection in the factory and the [01:04:34] inspection in the factory and the factory starts making a new component [01:04:35] factory starts making a new component you know they're making this model cell [01:04:37] you know they're making this model cell phone but cell phones turn over quickly [01:04:40] phone but cell phones turn over quickly and so but in a few months later they're [01:04:41] and so but in a few months later they're making a different type of cell phone or [01:04:43] making a different type of cell phone or something weird happens in a vacuum [01:04:44] something weird happens in a vacuum process so the lighting changes with a [01:04:46] process so the lighting changes with a new type of defect so the world changes [01:04:49] new type of defect so the world changes and um so what I'd like to do is [01:04:59] and um so what I'd like to do is actually revisit the previous question [01:05:03] actually revisit the previous question in light of this the world changes [01:05:07] in light of this the world changes phenomenon right which is let's say [01:05:10] phenomenon right which is let's say you've collected all data with American [01:05:12] you've collected all data with American accent two speakers and then you know we [01:05:14] accent two speakers and then you know we ship the product in the UK and then and [01:05:20] ship the product in the UK and then and then for some reason you find that [01:05:23] then for some reason you find that you've all these British accent speakers [01:05:24] you've all these British accent speakers right trying to use your spot speaker so [01:05:28] right trying to use your spot speaker so between these two algorithms the non [01:05:29] between these two algorithms the non machine learning approaches I said the [01:05:31] machine learning approaches I said the threshold versus train a neural network [01:05:33] threshold versus train a neural network which system do you think would be more [01:05:35] which system do you think would be more robust for dat voice activity detection [01:05:55] all right take like another 40 seconds [01:06:36] all right yeah interesting if you want [01:06:42] all right yeah interesting if you want to comment well more people voted for [01:06:46] to comment well more people voted for non ml just want to explain why for the [01:07:14] non ml just want to explain why for the VAD boys activity section if you just [01:07:17] VAD boys activity section if you just measure the volume then it doesn't [01:07:18] measure the volume then it doesn't really depend on on the accent like so [01:07:21] really depend on on the accent like so non ml might be more robust anyone else [01:07:28] all right so okay let me show you I [01:07:31] all right so okay let me show you I thought so it turns out that um if you [01:07:33] thought so it turns out that um if you train a small neural network to you know [01:07:38] train a small neural network to you know American accent is speech there's a [01:07:40] American accent is speech there's a bigger chance that your neural network [01:07:43] bigger chance that your neural network because it's so clever right that'll [01:07:45] because it's so clever right that'll learn to recognize American speech and [01:07:47] learn to recognize American speech and have a harder time generalizing to [01:07:49] have a harder time generalizing to British accent in speech he says and so [01:07:53] British accent in speech he says and so one of the things that have seen a lot [01:07:56] one of the things that have seen a lot of teams where is so one way the non-mo [01:08:00] of teams where is so one way the non-mo thing could fail to generalize would be [01:08:02] thing could fail to generalize would be a British speakers are systematically [01:08:04] a British speakers are systematically you know allowed there or softer than [01:08:06] you know allowed there or softer than American speakers right sir [01:08:07] American speakers right sir you know I don't know I don't have [01:08:09] you know I don't know I don't have Americans saris Oakley allowed their own [01:08:11] Americans saris Oakley allowed their own less loud and British but but you know [01:08:12] less loud and British but but you know but if but if American British because [01:08:14] but if but if American British because one one country just as louder voices [01:08:17] one one country just as louder voices and softer voices then maybe the [01:08:19] and softer voices then maybe the threshold you set won't generalize well [01:08:21] threshold you set won't generalize well but that seems unlikely right I don't [01:08:23] but that seems unlikely right I don't see that being realistically but but [01:08:26] see that being realistically but but they were training on your network a lot [01:08:27] they were training on your network a lot parameters then it's more likely that [01:08:31] parameters then it's more likely that the neural network will pick up on some [01:08:33] the neural network will pick up on some idiosyncrasy of American accents to [01:08:36] idiosyncrasy of American accents to decide the cities even speaking and thus [01:08:39] decide the cities even speaking and thus maybe less robust to generalizing into a [01:08:42] maybe less robust to generalizing into a British accent speech right and another [01:08:44] British accent speech right and another way to think about this is if you [01:08:45] way to think about this is if you imagine to take it even further example [01:08:47] imagine to take it even further example imagine that you're using VAD for a [01:08:50] imagine that you're using VAD for a totally different language than intent [01:08:51] totally different language than intent in English right where take a different [01:08:55] in English right where take a different language you know Chinese or Hindi you [01:08:57] language you know Chinese or Hindi you all Spanish or something where the [01:08:59] all Spanish or something where the sounds are really different [01:09:00] sounds are really different if you create a VAD system to detect you [01:09:03] if you create a VAD system to detect you know English [01:09:04] know English it may not at all work for detecting [01:09:06] it may not at all work for detecting Spanish or Chinese or French or or some [01:09:09] Spanish or Chinese or French or or some other language and so if you think of [01:09:12] other language and so if you think of British accents as somewhere on the [01:09:14] British accents as somewhere on the spectrum not the foreign language [01:09:16] spectrum not the foreign language binding means but just more different [01:09:17] binding means but just more different then I think the nominal system is more [01:09:21] then I think the nominal system is more likely to be robust and so one lesson is [01:09:25] likely to be robust and so one lesson is that too many that a lot of machine [01:09:27] that too many that a lot of machine learning teams during the hard way is uh [01:09:29] learning teams during the hard way is uh if you don't need to use a learning [01:09:31] if you don't need to use a learning algorithm or something if you can hand [01:09:33] algorithm or something if you can hand code a simple room like if volume [01:09:36] code a simple room like if volume greater than 0.01 do this all that those [01:09:39] greater than 0.01 do this all that those rules are [01:09:40] rules are can be more robust and the one of the [01:09:44] can be more robust and the one of the reasons we use learning algorithms is [01:09:45] reasons we use learning algorithms is when we can't hang called something [01:09:46] when we can't hang called something right I don't know how the hand comes [01:09:48] right I don't know how the hand comes something to detect a cat or to take a [01:09:50] something to detect a cat or to take a car in the real goal detect the person [01:09:52] car in the real goal detect the person so use learning our ones for those but [01:09:54] so use learning our ones for those but there's actually hand coded rule that [01:09:56] there's actually hand coded rule that actually does pretty well you find that [01:09:58] actually does pretty well you find that it is more robust to ships in the data [01:10:01] it is more robust to ships in the data and will often generalize better oh and [01:10:05] and will often generalize better oh and if any of you take a we talked about [01:10:08] if any of you take a we talked about this about this little bit in CS 239 I [01:10:10] this about this little bit in CS 239 I think this is your ninety talks about [01:10:12] think this is your ninety talks about this as well but this particular [01:10:14] this as well but this particular observation is backed up by very [01:10:15] observation is backed up by very rigorous learning theory and the [01:10:17] rigorous learning theory and the learning theory is basically that the [01:10:19] learning theory is basically that the fewer parameters you have if you still [01:10:22] fewer parameters you have if you still do well on your training set if you can [01:10:23] do well on your training set if you can have model with very few parameters that [01:10:25] have model with very few parameters that does well on your training set you [01:10:27] does well on your training set you generalize better right so there's this [01:10:28] generalize better right so there's this very rigorous machine learning theory [01:10:30] very rigorous machine learning theory that basically says that and in the case [01:10:33] that basically says that and in the case of the non machine learning approach [01:10:34] of the non machine learning approach there's maybe one parameter which is [01:10:36] there's maybe one parameter which is what's the threshold for epsilon and [01:10:37] what's the threshold for epsilon and that's welcome well now for your [01:10:39] that's welcome well now for your training set then you're also fit [01:10:40] training set then you're also fit generalizing even when the data changes [01:10:42] generalizing even when the data changes is um much higher right um now the last [01:10:52] is um much higher right um now the last question um I want to post the [01:10:56] question um I want to post the discussion today is when when discussing [01:10:59] discussion today is when when discussing deployments oh and so one of the lessons [01:11:02] deployments oh and so one of the lessons deployment is that's just a way the [01:11:04] deployment is that's just a way the world works you know build a machine [01:11:05] world works you know build a machine learning system deploy it the world will [01:11:07] learning system deploy it the world will usually change and you often end up [01:11:08] usually change and you often end up collecting data and have it integrated [01:11:10] collecting data and have it integrated and maybe improve the model right and [01:11:12] and maybe improve the model right and they're fixing waffle for British [01:11:13] they're fixing waffle for British speakers or nothing um so we talked [01:11:16] speakers or nothing um so we talked about edged appointments as well called [01:11:19] about edged appointments as well called appointments and so um ignoring issues [01:11:24] appointments and so um ignoring issues of user privacy and latency which is [01:11:26] of user privacy and latency which is super important but for purposes [01:11:28] super important but for purposes question let's let's let's put aside [01:11:30] question let's let's let's put aside issues of user privacy and network [01:11:32] issues of user privacy and network latency if you need to maintain the [01:11:34] latency if you need to maintain the model sorry maintenance means updating [01:11:36] model sorry maintenance means updating the model right even as the world [01:11:38] the model right even as the world changes [01:11:40] sorry I missed mr. history does a column [01:11:43] sorry I missed mr. history does a column or H deployment make maintenance easier [01:11:46] or H deployment make maintenance easier if not of right why don't you watch you [01:11:48] if not of right why don't you watch you just enter a one-word answer [01:11:50] just enter a one-word answer and why right and so maintenance is [01:11:53] and why right and so maintenance is going the world changes something [01:11:55] going the world changes something changes so you need to update the [01:11:56] changes so you need to update the learning model to take it back take care [01:11:58] learning model to take it back take care of this British accent so which type of [01:12:01] of this British accent so which type of deployment makes it easier let me just [01:12:08] deployment makes it easier let me just take like yeah another two minutes and [01:12:10] take like yeah another two minutes and your answers [01:13:13] all right another 50 seconds [01:13:57] all right cool see what people wrote Wow [01:14:08] all right cool see what people wrote Wow cool great all right almost everyone the [01:14:09] cool great all right almost everyone the same college most people are saying [01:14:17] same college most people are saying cloud alright cool [01:14:21] cloud alright cool great and then just to summarize I think [01:14:23] great and then just to summarize I think there are two reasons why most people [01:14:26] there are two reasons why most people say this is easier push updates that's [01:14:27] say this is easier push updates that's part of it I think the other part of it [01:14:29] part of it I think the other part of it is that if all the data lives at the [01:14:31] is that if all the data lives at the edge if all the data's process you know [01:14:33] edge if all the data's process you know they use this home and then if it comes [01:14:34] they use this home and then if it comes to crawl then even if you have all these [01:14:36] to crawl then even if you have all these unhappy British accents and users you [01:14:38] unhappy British accents and users you may not even find out right you say the [01:14:40] may not even find out right you say the company headquarters you have all these [01:14:41] company headquarters you have all these users that mysteriously [01:14:43] users that mysteriously you know seem to be not using your [01:14:44] you know seem to be not using your device maybe because they're in [01:14:46] device maybe because they're in satisfied with it but if the data isn't [01:14:48] satisfied with it but if the data isn't coming into your service in the cloud [01:14:49] coming into your service in the cloud then you may not even find out about it [01:14:51] then you may not even find out about it now there's serious issues about user [01:14:52] now there's serious issues about user privacy as well security right so so so [01:14:56] privacy as well security right so so so please if you ever bought the product [01:14:57] please if you ever bought the product please be respectful of that and then [01:15:00] please be respectful of that and then take take take care of that in a very [01:15:02] take take take care of that in a very thoughtful and respectful way of users [01:15:04] thoughtful and respectful way of users but if first if so this is the cloud if [01:15:11] but if first if so this is the cloud if you have a lot of edge devices and all [01:15:12] you have a lot of edge devices and all the data is processed there um you won't [01:15:16] the data is processed there um you won't even know what your users are doing and [01:15:18] even know what your users are doing and they're happy unhappy you just don't [01:15:20] they're happy unhappy you just don't know [01:15:21] know but if some of the data are in the [01:15:22] but if some of the data are in the stream to your service at the cloud and [01:15:24] stream to your service at the cloud and if the user privacy would really [01:15:27] if the user privacy would really please use good music consent tell [01:15:29] please use good music consent tell people what you doing on the data but if [01:15:31] people what you doing on the data but if you take care of that you know in a [01:15:34] you take care of that you know in a reasonable and sound way if you're able [01:15:36] reasonable and sound way if you're able to examine some of the data then you can [01:15:38] to examine some of the data then you can at least figure out that gee looks like [01:15:41] at least figure out that gee looks like analyzing the data there are these [01:15:43] analyzing the data there are these people this acts on this background [01:15:44] people this acts on this background noise that is giving it back rather than [01:15:48] noise that is giving it back rather than experience and you can also maybe have [01:15:50] experience and you can also maybe have the data so you can gather the data from [01:15:52] the data so you can gather the data from the edge to feedback to your model right [01:15:55] the edge to feedback to your model right so so lets you detect that something's [01:15:58] so so lets you detect that something's gone wrong it lets you have the data to [01:16:01] gone wrong it lets you have the data to retrain the model to solve the British [01:16:04] retrain the model to solve the British accent problems you can retrain the [01:16:05] accent problems you can retrain the model for a lot [01:16:06] model for a lot British accent to speak and then finally [01:16:09] British accent to speak and then finally lets you push them all back home okay so [01:16:11] lets you push them all back home okay so the first unless you detect what's going [01:16:13] the first unless you detect what's going on - it gives you data for training then [01:16:20] on - it gives you data for training then three unless you more easily push the [01:16:22] three unless you more easily push the model back up push push the new model to [01:16:25] model back up push push the new model to a production it's a deployment okay oh [01:16:27] a production it's a deployment okay oh and this is also why even if your [01:16:31] and this is also why even if your computation needs to run on the edge if [01:16:33] computation needs to run on the edge if you could in a way respectful of user [01:16:36] you could in a way respectful of user privacy in this transparent about how [01:16:37] privacy in this transparent about how you use data if you can get even a small [01:16:39] you use data if you can get even a small sample of data or have a few volunteer [01:16:42] sample of data or have a few volunteer users send you some data back to the [01:16:44] users send you some data back to the cloud that will greatly increase your [01:16:46] cloud that will greatly increase your ability to detect there's something gone [01:16:47] ability to detect there's something gone wrong as well maybe give you some data [01:16:49] wrong as well maybe give you some data to retrain the model so even if you can [01:16:51] to retrain the model so even if you can only do so that push updates right this [01:16:54] only do so that push updates right this this will just will help greatly okay um [01:16:57] this will just will help greatly okay um all right so finally one last comment I [01:17:01] all right so finally one last comment I think one one one last challenge is a [01:17:04] think one one one last challenge is a lot of machine learning systems you're [01:17:06] lot of machine learning systems you're not done at deployment there's a [01:17:07] not done at deployment there's a constant ongoing maintenance process and [01:17:09] constant ongoing maintenance process and I think one of the processes you know AI [01:17:13] I think one of the processes you know AI teams are getting better on this wall I [01:17:14] teams are getting better on this wall I set up QA to make sure that we update [01:17:16] set up QA to make sure that we update the model you don't break something so I [01:17:18] the model you don't break something so I think QA and large companies Quality [01:17:20] think QA and large companies Quality Assurance process it's called testing us [01:17:22] Assurance process it's called testing us and I think the way you test machine [01:17:24] and I think the way you test machine algorithms is different in the way you [01:17:26] algorithms is different in the way you test try there's no software because the [01:17:28] test try there's no software because the performance of machine learning [01:17:29] performance of machine learning algorithms is often measured in a [01:17:31] algorithms is often measured in a statistical way right so it doesn't work [01:17:32] statistical way right so it doesn't work and it doesn't work it neither works no [01:17:35] and it doesn't work it neither works no doesn't work instead it works you know [01:17:37] doesn't work instead it works you know 95 percent of the time or something and [01:17:38] 95 percent of the time or something and so lot of companies are evolving the QA [01:17:41] so lot of companies are evolving the QA processes that this type of statistical [01:17:43] processes that this type of statistical testing to make sure that even you [01:17:44] testing to make sure that even you change the modern you to push update its [01:17:47] change the modern you to push update its the works you know 95 or 99 percent of [01:17:49] the works you know 95 or 99 percent of the time or something rather than so-so [01:17:51] the time or something rather than so-so so putting in place new QA test [01:17:53] so putting in place new QA test processes as well ok all right I hope [01:17:57] processes as well ok all right I hope that was helpful stepping through what [01:17:59] that was helpful stepping through what the full arc of a machine learning [01:18:00] the full arc of a machine learning project will look like well well later [01:18:03] project will look like well well later this for certain course vias was no [01:18:05] this for certain course vias was no later lectures are present we keep [01:18:06] later lectures are present we keep talking about machine learning strategy [01:18:08] talking about machine learning strategy and how to make business [01:18:10] and how to make business so let's break for today ================================================================================ LECTURE 004 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 4 - Adversarial Attacks / GANs Source: https://www.youtube.com/watch?v=ANszao6YQuM --- Transcript [00:00:04] okay let's get [00:00:06] okay let's get so welcome to lecture number four today [00:00:11] so welcome to lecture number four today we will go over two topics that are not [00:00:14] we will go over two topics that are not discussed in the Coursera videos you've [00:00:17] discussed in the Coursera videos you've been learning c 2 m 1 and c 2 m 2 if I'm [00:00:22] been learning c 2 m 1 and c 2 m 2 if I'm not mistaking so you've learnt about [00:00:24] not mistaking so you've learnt about what an initialization is how to tune [00:00:28] what an initialization is how to tune your own networks what tests validation [00:00:31] your own networks what tests validation and trainsets [00:00:32] and trainsets are today we're going to go a little [00:00:34] are today we're going to go a little further you should have the background [00:00:36] further you should have the background to understand 80% of this lecture [00:00:39] to understand 80% of this lecture there's maybe 20% that I want you to [00:00:41] there's maybe 20% that I want you to look back after you've seen the batch [00:00:43] look back after you've seen the batch norm videos for those of you who haven't [00:00:45] norm videos for those of you who haven't seen them [00:00:45] seen them so we'll fit the lecture in two parts [00:00:48] so we'll fit the lecture in two parts and I put back the attendance code at [00:00:50] and I put back the attendance code at the at the very end of the lecture so [00:00:52] the at the very end of the lecture so don't worry [00:00:53] don't worry one topic is attacking neural networks [00:00:56] one topic is attacking neural networks with adversarial examples the second one [00:01:00] with adversarial examples the second one is generative adversarial networks and [00:01:03] is generative adversarial networks and although these two topics have a common [00:01:06] although these two topics have a common word which is adversarial there are two [00:01:08] word which is adversarial there are two separate topics you will understand why [00:01:10] separate topics you will understand why it's called adversarial in both cases so [00:01:12] it's called adversarial in both cases so let's get started with adversarial [00:01:15] let's get started with adversarial examples and in 2013 a Christian Zagat [00:01:20] examples and in 2013 a Christian Zagat in his team have published a paper [00:01:23] in his team have published a paper called intriguing properties of neural [00:01:25] called intriguing properties of neural networks what they noticed is that [00:01:27] networks what they noticed is that neural net was neural networks have kind [00:01:29] neural net was neural networks have kind of a blind spots a spot for which [00:01:33] of a blind spots a spot for which several machine learning including the [00:01:35] several machine learning including the state-of-the-art ones that you will [00:01:36] state-of-the-art ones that you will learn about vgg 1619 inception networks [00:01:41] learn about vgg 1619 inception networks and residual networks are vulnerable to [00:01:45] and residual networks are vulnerable to something called adverse early examples [00:01:46] something called adverse early examples these adverse early examples you're [00:01:49] these adverse early examples you're going to learn what it is in three parts [00:01:52] going to learn what it is in three parts first by explaining how these examples [00:01:55] first by explaining how these examples in the context of images can attack a [00:01:57] in the context of images can attack a network in their blind spot and and make [00:02:01] network in their blind spot and and make the network classify these images as [00:02:03] the network classify these images as something totally wrong how to defend [00:02:06] something totally wrong how to defend against this type of examples and why [00:02:09] against this type of examples and why are networks [00:02:11] are networks vulnerable to this type of examples this [00:02:13] vulnerable to this type of examples this is a little bit more theoretical and [00:02:14] is a little bit more theoretical and we're going to go over it on the board [00:02:16] we're going to go over it on the board the papers that are listed on the bottom [00:02:20] the papers that are listed on the bottom or the two big papers that started this [00:02:22] or the two big papers that started this field of research so I would advise you [00:02:24] field of research so I would advise you to go and and read them because we have [00:02:27] to go and and read them because we have only one hour and a half to go over two [00:02:29] only one hour and a half to go over two big topics in in deep learning and we [00:02:33] big topics in in deep learning and we will not have the time to go into [00:02:34] will not have the time to go into details of everything [00:02:36] details of everything okay so let's set up the goal the goal [00:02:39] okay so let's set up the goal the goal is like is that given a pre trained [00:02:41] is like is that given a pre trained network so a network trained on imagenet [00:02:43] network so a network trained on imagenet on a thousand classes millions of images [00:02:46] on a thousand classes millions of images find an input image that is not an [00:02:50] find an input image that is not an iguana [00:02:51] iguana so doesn't look like the animal iguana [00:02:53] so doesn't look like the animal iguana but will be classified by the network as [00:02:56] but will be classified by the network as an iguana we will call this an [00:02:58] an iguana we will call this an adversarial example if we manage to find [00:03:01] adversarial example if we manage to find it okay yeah one question so 2848 89 let [00:03:12] it okay yeah one question so 2848 89 let me write it down on the board can you [00:03:20] me write it down on the board can you guys see okay that's more so we have a [00:03:28] guys see okay that's more so we have a network pre trained on Imogen it's a [00:03:30] network pre trained on Imogen it's a very good network what I want is to fool [00:03:35] very good network what I want is to fool it by giving it an image that doesn't [00:03:36] it by giving it an image that doesn't look like anyone a bird is classified as [00:03:38] look like anyone a bird is classified as an iguana so if I give it a cat image to [00:03:40] an iguana so if I give it a cat image to start with the network is obviously [00:03:42] start with the network is obviously going to give me a vector of [00:03:44] going to give me a vector of probabilities that has the maximum [00:03:46] probabilities that has the maximum probability for cats because it's a good [00:03:48] probability for cats because it's a good network and you can guess what's the [00:03:50] network and you can guess what's the output layer of this network is probably [00:03:52] output layer of this network is probably a soft max it's a classification network [00:03:55] a soft max it's a classification network now what I want is to find an image x [00:03:58] now what I want is to find an image x that is going to be classified as an [00:04:01] that is going to be classified as an iguana by the network okay [00:04:04] iguana by the network okay does the the setting make sense to [00:04:07] does the the setting make sense to everyone okay now as usual this this [00:04:12] everyone okay now as usual this this might remind you of what we've seen [00:04:14] might remind you of what we've seen together about neural style transfer [00:04:16] together about neural style transfer remember the art generation thing where [00:04:18] remember the art generation thing where we wanted to generate an image based on [00:04:21] we wanted to generate an image based on the content of a first image and the [00:04:23] the content of a first image and the style of another image and in that [00:04:25] style of another image and in that problem the main difference with classic [00:04:28] problem the main difference with classic supervised learning was that we fix the [00:04:30] supervised learning was that we fix the parameters of the network which was also [00:04:32] parameters of the network which was also pre trained [00:04:33] pre trained and we back propagate the error of the [00:04:35] and we back propagate the error of the loss all the way back to the input image [00:04:37] loss all the way back to the input image to update the pixels so that it looks [00:04:40] to update the pixels so that it looks like the content of the contents image [00:04:44] like the content of the contents image and the style of the style image the [00:04:46] and the style of the style image the first thing we did is that we rephrase [00:04:47] first thing we did is that we rephrase the problem we try to phrase what [00:04:50] the problem we try to phrase what exactly we want so what would you say is [00:04:53] exactly we want so what would you say is a sentence that defines our last [00:04:56] a sentence that defines our last function let's say yes an image that [00:05:20] function let's say yes an image that provides minimum cost ok what's the cost [00:05:22] provides minimum cost ok what's the cost you're talking about expected iguana and [00:05:30] you're talking about expected iguana and not expected iguana what do you mean [00:05:31] not expected iguana what do you mean exactly by that we're trying to Train it [00:05:39] yeah okay [00:05:42] yeah okay so you want this image to minimize a [00:05:45] so you want this image to minimize a certain loss function and the loss [00:05:46] certain loss function and the loss function would be the distance metric [00:05:48] function would be the distance metric between the output you're looking for [00:05:50] between the output you're looking for and the output you want okay yeah so I [00:05:54] and the output you want okay yeah so I would say we want to find X the image [00:05:56] would say we want to find X the image that Y hat of X which is the result of [00:05:59] that Y hat of X which is the result of the forward propagation of X in the [00:06:00] the forward propagation of X in the network is equal to Y iguanas which is a [00:06:04] network is equal to Y iguanas which is a one hard vector with the one at the [00:06:06] one hard vector with the one at the position of iguanas [00:06:07] position of iguanas does that make sense so now based on [00:06:10] does that make sense so now based on that we define our loss function which [00:06:12] that we define our loss function which is can be an l2 loss can be an l1 loss [00:06:15] is can be an l2 loss can be an l1 loss can be a croissant repea in practice [00:06:17] can be a croissant repea in practice this one works better so you see that [00:06:22] this one works better so you see that minimizing this loss function will lead [00:06:24] minimizing this loss function will lead our image X to be outputted as an iguana [00:06:28] our image X to be outputted as an iguana by the network that makes sense and then [00:06:31] by the network that makes sense and then the process is very similar to neural [00:06:32] the process is very similar to neural side transfer where we will optimize the [00:06:35] side transfer where we will optimize the image iteratively so we will start with [00:06:37] image iteratively so we will start with X we will forward propagate it compute [00:06:41] X we will forward propagate it compute the loss function that we just defined [00:06:42] the loss function that we just defined and remember we're not training the [00:06:45] and remember we're not training the network right which [00:06:46] network right which take the derivative of the loss function [00:06:48] take the derivative of the loss function all the way back to the inputs and [00:06:50] all the way back to the inputs and update the input using a graduate [00:06:52] update the input using a graduate descent algorithm until we get something [00:06:56] descent algorithm until we get something that is classified as anyone yeah any [00:07:00] that is classified as anyone yeah any question on that okay so you mentioned [00:07:06] question on that okay so you mentioned that it doesn't warranty that X is not [00:07:09] that it doesn't warranty that X is not going to look like something the only [00:07:11] going to look like something the only thing is guaranteeing is that this X [00:07:13] thing is guaranteeing is that this X will be classified as an iguana if we [00:07:15] will be classified as an iguana if we trained properly we will talk about that [00:07:18] trained properly we will talk about that now another question in the back I [00:07:19] now another question in the back I thought yeah oh yeah it could be binary [00:07:27] thought yeah oh yeah it could be binary croissant it could be croissant repiy [00:07:28] croissant it could be croissant repiy yeah so in this case not binary [00:07:30] yeah so in this case not binary cross-entropy because we have a vector [00:07:33] cross-entropy because we have a vector of of n classes but it could have been [00:07:36] of of n classes but it could have been croissant from here okay so yeah that's [00:07:40] croissant from here okay so yeah that's true we are we Goren T that's the forged [00:07:42] true we are we Goren T that's the forged image X this one is going to look like [00:07:45] image X this one is going to look like an iguana who thinks it's going to look [00:07:50] an iguana who thinks it's going to look like an iguana a few who thinks it's not [00:07:55] like an iguana a few who thinks it's not going to look like anyone okay majority [00:07:58] going to look like anyone okay majority of you so can someone tell me why it's [00:08:00] of you so can someone tell me why it's not going to look like an iguana [00:08:13] okay so you say the loss function is [00:08:16] okay so you say the loss function is unconstrained is very unconstrained so [00:08:18] unconstrained is very unconstrained so we didn't put any constraint on what the [00:08:19] we didn't put any constraint on what the image should look like that's true [00:08:21] image should look like that's true actually the answer to this question is [00:08:22] actually the answer to this question is it depends [00:08:23] it depends we don't know maybe it looks like an [00:08:24] we don't know maybe it looks like an iguana maybe does it but in terms of [00:08:26] iguana maybe does it but in terms of probabilities it's high chance that it [00:08:28] probabilities it's high chance that it doesn't look like anyone so the reason [00:08:30] doesn't look like anyone so the reason is here let's say this is our space of [00:08:33] is here let's say this is our space of input images an interesting thing is [00:08:35] input images an interesting thing is that even if as human on a daily basis [00:08:38] that even if as human on a daily basis we deal with images of the real world so [00:08:40] we deal with images of the real world so like I mean if you look at the TV [00:08:43] like I mean if you look at the TV that is totally buggy you see pixels [00:08:46] that is totally buggy you see pixels random pixels but in other contexts we [00:08:48] random pixels but in other contexts we usually see real word distribution [00:08:50] usually see real word distribution images our network is deterministic it [00:08:52] images our network is deterministic it means it takes an image any input image [00:08:55] means it takes an image any input image that fits the the first layer would [00:08:57] that fits the the first layer would would be would produce an output right [00:09:00] would be would produce an output right so this is the whole space of input [00:09:03] so this is the whole space of input images that the network can see this is [00:09:09] images that the network can see this is the space of real images it's a lot [00:09:11] the space of real images it's a lot smaller can someone tell me what's the [00:09:13] smaller can someone tell me what's the size of the the space of possible input [00:09:16] size of the the space of possible input images for a network so infinite it's [00:09:22] images for a network so infinite it's not infinite it's been a lot but okay [00:09:28] not infinite it's been a lot but okay yeah there is an idea here someone here [00:09:31] yeah there is an idea here someone here is number impossible pixel permutations [00:09:35] is number impossible pixel permutations yeah that's true [00:09:37] yeah that's true so more precisely you would start with [00:09:39] so more precisely you would start with how many pixel values are there there [00:09:43] how many pixel values are there there are 255 256 pixel values and then what's [00:09:47] are 255 256 pixel values and then what's the size of an image let's say 64 by 64 [00:09:50] the size of an image let's say 64 by 64 by 3 and your results would give you 256 [00:09:53] by 3 and your results would give you 256 so you fix the first pixel 256 possible [00:09:57] so you fix the first pixel 256 possible value then the second one can be [00:09:59] value then the second one can be anything else then the third one can be [00:10:01] anything else then the third one can be anything else [00:10:01] anything else and you end up with a very big number so [00:10:03] and you end up with a very big number so this is a huge number and the space of [00:10:06] this is a huge number and the space of real images is here now if we had to [00:10:08] real images is here now if we had to plot the space of of images classified [00:10:11] plot the space of of images classified as an iguana it would be something like [00:10:13] as an iguana it would be something like that right and you see that there is a [00:10:15] that right and you see that there is a small overlap between the space of real [00:10:18] small overlap between the space of real images and the space of him [00:10:20] images and the space of him is classified by as an iguana by the [00:10:21] is classified by as an iguana by the network and this is where we probably [00:10:25] network and this is where we probably are not we're probably in the green part [00:10:28] are not we're probably in the green part that is not overlapping with the red [00:10:29] that is not overlapping with the red part because we didn't constrain our [00:10:31] part because we didn't constrain our optimization problem does that make [00:10:33] optimization problem does that make sense okay now we're going to constrain [00:10:36] sense okay now we're going to constrain it a little bit more because in practice [00:10:39] it a little bit more because in practice this type of attacks are not too [00:10:42] this type of attacks are not too dangerous because as a human we would [00:10:44] dangerous because as a human we would see that the pictures look like garbage [00:10:46] see that the pictures look like garbage the dangerous attack is if the picture [00:10:49] the dangerous attack is if the picture looks like a cat but the network sees it [00:10:52] looks like a cat but the network sees it as an iguana and humans see it as a cat [00:10:54] as an iguana and humans see it as a cat can someone think of of like malicious [00:10:58] can someone think of of like malicious applications of that face recognitions [00:11:03] applications of that face recognitions you could show a face you could show [00:11:07] you could show a face you could show your your picture of your face I'll push [00:11:09] your your picture of your face I'll push the network to think it's a face of [00:11:10] the network to think it's a face of someone else what else yeah breaking [00:11:23] someone else what else yeah breaking CAPTCHAs if you know what the output [00:11:25] CAPTCHAs if you know what the output what output you want you can force the [00:11:27] what output you want you can force the network to think that this CAPTCHA did [00:11:29] network to think that this CAPTCHA did this input CAPTCHA is the output it's [00:11:31] this input CAPTCHA is the output it's looking for or in general I would say [00:11:34] looking for or in general I would say like social medias if someone is [00:11:38] like social medias if someone is malicious and wants to put violent [00:11:40] malicious and wants to put violent content online there is all these [00:11:43] content online there is all these companies have algorithms to check for [00:11:44] companies have algorithms to check for this violent content if people can use [00:11:47] this violent content if people can use adverse you're examples that look still [00:11:49] adverse you're examples that look still violent but are not detected as violent [00:11:51] violent but are not detected as violent by the algorithms using this methodology [00:11:53] by the algorithms using this methodology they could still publish their violent [00:11:55] they could still publish their violent pictures think about self-driving cars a [00:11:58] pictures think about self-driving cars a stop sign that looks like a stop sign [00:11:59] stop sign that looks like a stop sign for everyone but when the self-driving [00:12:01] for everyone but when the self-driving car sees it it's not a stop sign so [00:12:05] car sees it it's not a stop sign so these are malicious applications of [00:12:06] these are malicious applications of adversity examples and they're a lot [00:12:08] adversity examples and they're a lot more okay and in fact the picture we [00:12:11] more okay and in fact the picture we generated previously would look like [00:12:13] generated previously would look like that it's nothing special [00:12:15] that it's nothing special so now let's constrain our problem a [00:12:17] so now let's constrain our problem a little bit more we're going to say we [00:12:19] little bit more we're going to say we want the picture to look like a cat but [00:12:22] want the picture to look like a cat but be classified as an iguana okay so now [00:12:27] be classified as an iguana okay so now same we have our neural network if we [00:12:30] same we have our neural network if we give it a cat is going to predict that [00:12:31] give it a cat is going to predict that it's a cat what we want is [00:12:33] it's a cat what we want is she'll give it a cut but predict that [00:12:35] she'll give it a cut but predict that it's only one okay III go quickly over [00:12:41] it's only one okay III go quickly over that because it's very similar to what [00:12:43] that because it's very similar to what we did before so I just plot I just put [00:12:45] we did before so I just plot I just put back what we had on the previous slide [00:12:47] back what we had on the previous slide okay exactly the same thing now the way [00:12:51] okay exactly the same thing now the way we phrase our problem will be a little [00:12:52] we phrase our problem will be a little different instead of saying we want only [00:12:55] different instead of saying we want only y hat of x equals y y now we have [00:12:58] y hat of x equals y y now we have another constraint what's the other [00:12:59] another constraint what's the other constraint the picture X should be [00:13:11] constraint the picture X should be closer to the picture of the cat so we [00:13:12] closer to the picture of the cat so we want X equal or very close to X cat and [00:13:16] want X equal or very close to X cat and in terms of loss function what it does [00:13:18] in terms of loss function what it does is that it adds another term which is [00:13:23] is that it adds another term which is going to decide how X should be close to [00:13:25] going to decide how X should be close to X cat if we minimize this loss now we [00:13:27] X cat if we minimize this loss now we should have an image that looks like a [00:13:30] should have an image that looks like a cat because of the second term and that [00:13:31] cat because of the second term and that is predicted as an iguana because of the [00:13:34] is predicted as an iguana because of the first term does that make sense [00:13:36] first term does that make sense so we're just building up our loss [00:13:38] so we're just building up our loss functions and I guess you guys are very [00:13:39] functions and I guess you guys are very familiar with this type of thought [00:13:41] familiar with this type of thought process now okay an same process we [00:13:44] process now okay an same process we optimized until we hopefully get a cat [00:13:47] optimized until we hopefully get a cat now a question is what should be the [00:13:52] now a question is what should be the initial image we start with we didn't [00:13:57] initial image we start with we didn't talk about that in the previous example [00:14:03] yeah [00:14:04] yeah white noise well yeah possibly white [00:14:07] white noise well yeah possibly white noise any other a cat [00:14:12] noise any other a cat yeah which cat [00:14:18] I don't know probably the cat that we [00:14:22] I don't know probably the cat that we put in the last function right because [00:14:24] put in the last function right because it's the closest one to what we want to [00:14:26] it's the closest one to what we want to get so if we want to have a fast process [00:14:28] get so if we want to have a fast process we'd better start with exactly this cat [00:14:30] we'd better start with exactly this cat which is the one we put in our last [00:14:32] which is the one we put in our last function here right if we put another [00:14:36] function here right if we put another cat is going to be a little longer [00:14:37] cat is going to be a little longer because we have to change the pixel of [00:14:39] because we have to change the pixel of the other cat to look like this cat [00:14:40] the other cat to look like this cat that's what we told our last function if [00:14:43] that's what we told our last function if we start with white noise it will take [00:14:44] we start with white noise it will take even longer because we have to change [00:14:46] even longer because we have to change the pixels all the way so that it looks [00:14:48] the pixels all the way so that it looks real and then it looks like a cat that [00:14:49] real and then it looks like a cat that we defined here so yeah the best thing [00:14:51] we defined here so yeah the best thing would be probably to start with the [00:14:53] would be probably to start with the picture of the cat does that make sense [00:14:56] picture of the cat does that make sense and then move the pixels so that this [00:14:58] and then move the pixels so that this term is also minimized yeah yeah this is [00:15:22] term is also minimized yeah yeah this is this is empirical the fact that we use [00:15:24] this is empirical the fact that we use that type of loss function but in [00:15:27] that type of loss function but in practice it could have been any distance [00:15:28] practice it could have been any distance between X and X cat and any distance [00:15:31] between X and X cat and any distance between you I hurt my cat yeah and why [00:15:33] between you I hurt my cat yeah and why you go on Oscar yes [00:15:45] exactly it's a bunch of cats I'm not [00:16:09] exactly it's a bunch of cats I'm not sure about the second method but just to [00:16:10] sure about the second method but just to repeat the point you mentioned is that [00:16:12] repeat the point you mentioned is that here we had to choose a cat [00:16:14] here we had to choose a cat it means the X cat is actually an image [00:16:17] it means the X cat is actually an image of a cat so what if we don't know what [00:16:20] of a cat so what if we don't know what the cat should look like we just want a [00:16:22] the cat should look like we just want a random cat to come out and be classified [00:16:25] random cat to come out and be classified as an iguana we're going to see a [00:16:27] as an iguana we're going to see a generative networks after which can be [00:16:29] generative networks after which can be used to do that type of stuff but but [00:16:32] used to do that type of stuff but but for the second part of the question I'm [00:16:34] for the second part of the question I'm not sure what the optimization process [00:16:35] not sure what the optimization process would look like okay [00:16:39] would look like okay let's move on so yeah it's probably a [00:16:43] let's move on so yeah it's probably a good idea to start with the cat image [00:16:44] good idea to start with the cat image that we specified in the last function [00:16:48] that we specified in the last function okay and so then we have an image of a [00:16:51] okay and so then we have an image of a cat that originally was classified as [00:16:53] cat that originally was classified as 92% cat and we modified a few pixels so [00:16:56] 92% cat and we modified a few pixels so you can see that this image looks a [00:16:58] you can see that this image looks a little blurry [00:16:59] little blurry so by doing this modification the [00:17:02] so by doing this modification the network will think it's an iguana okay [00:17:05] network will think it's an iguana okay and sometimes this modification can be [00:17:07] and sometimes this modification can be very slight and we can even not be able [00:17:09] very slight and we can even not be able to notice it sounds good [00:17:12] to notice it sounds good now let's add something else to this to [00:17:18] now let's add something else to this to this to this draft we add a third set [00:17:21] this to this draft we add a third set which is the space of images that look [00:17:23] which is the space of images that look real to human so that's interesting [00:17:26] real to human so that's interesting because the space of images that look [00:17:28] because the space of images that look real to human is actually bigger the [00:17:30] real to human is actually bigger the space than the space of real images an [00:17:33] space than the space of real images an example is this one this is probably an [00:17:36] example is this one this is probably an image that looks real to human but it's [00:17:38] image that looks real to human but it's not an image that we could seen in the [00:17:39] not an image that we could seen in the daily life because of this slight pixel [00:17:41] daily life because of this slight pixel changes okay so these are the space of [00:17:45] changes okay so these are the space of dangerous are there examples they look [00:17:47] dangerous are there examples they look real to human but they're not actually [00:17:49] real to human but they're not actually real they might be used to fool model [00:17:52] real they might be used to fool model okay [00:17:56] now let's see a video by cracking at all [00:18:00] now let's see a video by cracking at all on real world example of adversity all [00:18:04] on real world example of adversity all examples so for those who cannot read [00:18:07] examples so for those who cannot read they're taking a camera which which [00:18:11] they're taking a camera which which classify which has a classifier and the [00:18:14] classify which has a classifier and the classifier classifies the first part as [00:18:16] classifier classifies the first part as the library and the second image that is [00:18:18] the library and the second image that is that the same as a prison [00:18:21] that the same as a prison so the second image has slight different [00:18:24] so the second image has slight different pixels but it's hard to see for him same [00:18:26] pixels but it's hard to see for him same here [00:18:27] here so the the classifier on the phone [00:18:30] so the the classifier on the phone classifies the first image as a washer [00:18:36] classifies the first image as a washer with fifty-two percent accuracy [00:18:38] with fifty-two percent accuracy confidence and the second one as a [00:18:41] confidence and the second one as a doormat so this is a small example of [00:18:48] doormat so this is a small example of what can what can be done okay now let's [00:18:52] what can what can be done okay now let's go we've seen how to generate these [00:18:54] go we've seen how to generate these adverse real examples it's an [00:18:55] adverse real examples it's an optimization process we will see what [00:18:59] optimization process we will see what are the type of attacks that we can lead [00:19:01] are the type of attacks that we can lead and what are defenses against these [00:19:03] and what are defenses against these adverse all examples so we would usually [00:19:06] adverse all examples so we would usually split the attacks into two parts non [00:19:09] split the attacks into two parts non targeted attacks and targeted attacks so [00:19:14] targeted attacks and targeted attacks so non targeted attacks means that we just [00:19:17] non targeted attacks means that we just want output we just want to find an [00:19:19] want output we just want to find an adversary example that is going to fool [00:19:21] adversary example that is going to fool the model while targeted attack is we [00:19:23] the model while targeted attack is we want to force this ad versatile example [00:19:25] want to force this ad versatile example to be output to output a specific class [00:19:28] to be output to output a specific class that we chose these are two different [00:19:30] that we chose these are two different type of attacks that that are widely [00:19:32] type of attacks that that are widely discussed in in the research knowledge [00:19:36] discussed in in the research knowledge of the attacker is something very [00:19:37] of the attacker is something very important for those of you who did some [00:19:38] important for those of you who did some crypto you know that we talk about white [00:19:41] crypto you know that we talk about white box attacks black box attacks so one [00:19:44] box attacks black box attacks so one interesting thing is that a black box [00:19:47] interesting thing is that a black box attack a white box attack is when you [00:19:49] attack a white box attack is when you have access to a network so we have our [00:19:50] have access to a network so we have our image and pretend free train network we [00:19:52] image and pretend free train network we have fully access to to all the [00:19:55] have fully access to to all the parameters and the gradients so it's [00:19:58] parameters and the gradients so it's probably an easier attack right we can [00:20:01] probably an easier attack right we can we can back propagate all the way back [00:20:03] we can back propagate all the way back to the image and update the image like [00:20:05] to the image and update the image like with it [00:20:07] with it box attack is when the model is probably [00:20:09] box attack is when the model is probably encrypted or something like that so that [00:20:11] encrypted or something like that so that we don't have access to its parameters [00:20:12] we don't have access to its parameters activations and architecture so the [00:20:16] activations and architecture so the question is how do we attack in blackbox [00:20:19] question is how do we attack in blackbox attack if we cannot back propagates [00:20:21] attack if we cannot back propagates because we don't have access to the [00:20:23] because we don't have access to the layers any ideas yeah numerical grade [00:20:29] layers any ideas yeah numerical grade yeah good idea [00:20:30] yeah good idea so you know you will trick the image a [00:20:32] so you know you will trick the image a little bit and you will see how it [00:20:33] little bit and you will see how it changes the laws looking at this you can [00:20:36] changes the laws looking at this you can you can do have an estimate of the [00:20:39] you can do have an estimate of the numerical gradient even if the model is [00:20:41] numerical gradient even if the model is a black box model this assumes that you [00:20:44] a black box model this assumes that you can query the model right you can query [00:20:46] can query the model right you can query it what if you cannot even query the [00:20:48] it what if you cannot even query the model or you can query it one time only [00:20:50] model or you can query it one time only it's to send you add virtual example how [00:20:53] it's to send you add virtual example how would you do that so this becomes more [00:20:56] would you do that so this becomes more complicated [00:21:07] so there is a very complex property of [00:21:11] so there is a very complex property of this address your example is is that [00:21:13] this address your example is is that they're highly transferable it means I [00:21:16] they're highly transferable it means I have a model here that is a nanny Moll [00:21:20] have a model here that is a nanny Moll classifier okay I don't have access to [00:21:24] classifier okay I don't have access to it I cannot even query it I still want [00:21:27] it I cannot even query it I still want to fool it what I'm going to do is that [00:21:29] to fool it what I'm going to do is that I'm going to build my own animal [00:21:31] I'm going to build my own animal classifier forge an adversarial example [00:21:33] classifier forge an adversarial example on it it's highly likely that it's going [00:21:36] on it it's highly likely that it's going to be an adverse example for the other [00:21:38] to be an adverse example for the other one as well so this is called [00:21:39] one as well so this is called transferability and it's still a [00:21:41] transferability and it's still a research topic okay we're trying to [00:21:43] research topic okay we're trying to understand why this happens and also how [00:21:48] understand why this happens and also how to defend against that you know maybe [00:21:50] to defend against that you know maybe your defense against that is - is - [00:21:53] your defense against that is - is - we're going to see it after I'm not [00:21:55] we're going to see it after I'm not going to say enough so does that make [00:21:57] going to say enough so does that make sense or no this transferability [00:21:58] sense or no this transferability probably it's because - animal [00:22:01] probably it's because - animal classifiers look at the same features in [00:22:03] classifiers look at the same features in images right and maybe these pixels that [00:22:06] images right and maybe these pixels that are playing we're playing with or [00:22:08] are playing we're playing with or changing also the output of the other [00:22:09] changing also the output of the other network let's go over some kind of [00:22:13] network let's go over some kind of defenses so one solution to defend [00:22:17] defenses so one solution to defend against these adversely networks is to [00:22:19] against these adversely networks is to create a safe safety net a safety net is [00:22:22] create a safe safety net a safety net is what is a net that like a firewall you [00:22:25] what is a net that like a firewall you will put it before your network every [00:22:28] will put it before your network every image that comes in will be classified [00:22:29] image that comes in will be classified as fake like forged or real by the [00:22:34] as fake like forged or real by the network and you only take those which [00:22:37] network and you only take those which are real and not not adversarial does [00:22:40] are real and not not adversarial does that make sense so you could you could [00:22:44] that make sense so you could you could you could say that okay but we can also [00:22:45] you could say that okay but we can also build an adversarial Network that that [00:22:48] build an adversarial Network that that fools this network right just we beat [00:22:51] fools this network right just we beat black box or white box we can just [00:22:52] black box or white box we can just create an adversary at example for this [00:22:54] create an adversary at example for this network it's true but the issue is that [00:22:56] network it's true but the issue is that now we have two constraints we have to [00:22:59] now we have two constraints we have to fool the first one and the second one at [00:23:00] fool the first one and the second one at the same time you know maybe if you fool [00:23:03] the same time you know maybe if you fool the first one there is a chance that the [00:23:05] the first one there is a chance that the second one is going to be fooled we [00:23:07] second one is going to be fooled we don't know okay [00:23:09] don't know okay it just makes it more complex there is [00:23:11] it just makes it more complex there is no good defense at this point - - [00:23:13] no good defense at this point - - to all type of adversarial examples this [00:23:15] to all type of adversarial examples this is an option that people are researching [00:23:16] is an option that people are researching for so the paper is here if you want to [00:23:19] for so the paper is here if you want to check it out can you guys think of [00:23:21] check it out can you guys think of another solution trained on multiple [00:23:40] another solution trained on multiple loss functions through different [00:23:41] loss functions through different networks so you're talking about an [00:23:44] networks so you're talking about an assembly maybe we can maybe we can [00:23:47] assembly maybe we can maybe we can create five networks to do our tasks and [00:23:50] create five networks to do our tasks and it's highly unlikely that the address on [00:23:52] it's highly unlikely that the address on that example is going to fool the file [00:23:55] that example is going to fool the file networks the same way right [00:23:58] networks the same way right any other ideas generates adversarial [00:24:07] any other ideas generates adversarial examples and trained on those okay so [00:24:11] examples and trained on those okay so you will generate a cat image that is [00:24:13] you will generate a cat image that is adversarial so some pixels have been [00:24:15] adversarial so some pixels have been changed to full a network you will label [00:24:17] changed to full a network you will label it as the human sees it so as a cat [00:24:20] it as the human sees it so as a cat because you want the network to still [00:24:22] because you want the network to still see that as a cat and you will train on [00:24:24] see that as a cat and you will train on those the downside of that is that it's [00:24:26] those the downside of that is that it's very costly we've seen that generating [00:24:28] very costly we've seen that generating adversarial examples is super custom and [00:24:31] adversarial examples is super custom and also we don't know if we can generalize [00:24:34] also we don't know if we can generalize to other adversary examples maybe we're [00:24:36] to other adversary examples maybe we're going to overfit to the ones we have so [00:24:38] going to overfit to the ones we have so it's another optimization problem now [00:24:40] it's another optimization problem now another solution is to train on add [00:24:45] another solution is to train on add virtual examples at the same time as we [00:24:47] virtual examples at the same time as we train on on normal examples so look at [00:24:50] train on on normal examples so look at this loss function this loss function [00:24:52] this loss function this loss function the loss new is a sum of two loss [00:24:54] the loss new is a sum of two loss functions one is the classic loss [00:24:56] functions one is the classic loss function we would use so let's say [00:24:57] function we would use so let's say croissant repiy in the case of [00:24:59] croissant repiy in the case of classification and the second one is the [00:25:03] classification and the second one is the same loss function but we give it the [00:25:05] same loss function but we give it the adversary genomics so what's the [00:25:09] adversary genomics so what's the complexity of that at a very gradient [00:25:11] complexity of that at a very gradient descent step [00:25:18] [Applause] [00:25:22] for every iteration of our gradient [00:25:24] for every iteration of our gradient descent we're going to have to iterate [00:25:26] descent we're going to have to iterate enough to forge an adversarial example [00:25:29] enough to forge an adversarial example at every step right because we have X [00:25:32] at every step right because we have X what we want to do is forward propagate [00:25:34] what we want to do is forward propagate X to the network to compute the first [00:25:36] X to the network to compute the first term generate X adversarial with the [00:25:39] term generate X adversarial with the optimization process and forward [00:25:41] optimization process and forward propagate it to calculate the second [00:25:43] propagate it to calculate the second term and then back propagate over the [00:25:44] term and then back propagate over the weights of the network these super [00:25:47] weights of the network these super costly as well and it's very similar to [00:25:48] costly as well and it's very similar to what you said is just online just all [00:25:50] what you said is just online just all the time [00:25:51] the time ok so what is interesting is we're going [00:25:56] ok so what is interesting is we're going to delve a little more there's another [00:25:58] to delve a little more there's another technique called logic pairing I just [00:26:00] technique called logic pairing I just put it here we're not going to talk [00:26:01] put it here we're not going to talk about it there's a paper here if you [00:26:02] about it there's a paper here if you want to check it it's another way to do [00:26:04] want to check it it's another way to do adversarial training but what I would [00:26:07] adversarial training but what I would like to talk about is more from a [00:26:08] like to talk about is more from a theoretical perspective why our neural [00:26:10] theoretical perspective why our neural network vulnerable to adversarial [00:26:13] network vulnerable to adversarial examples so let's let's do some some [00:26:16] examples so let's let's do some some work on the board yeah the noise thing [00:26:39] work on the board yeah the noise thing is also nice but you you so the thing is [00:26:42] is also nice but you you so the thing is that it's just like in crypto every time [00:26:44] that it's just like in crypto every time you come up with a defense someone will [00:26:46] you come up with a defense someone will come up with an attack and it's a race [00:26:48] come up with an attack and it's a race between humans you know so this is the [00:26:50] between humans you know so this is the same type of problem security problems [00:26:52] same type of problem security problems are ok so let's go over something [00:26:57] are ok so let's go over something interesting that is more on the in on [00:27:00] interesting that is more on the in on the intuition side of adverse early [00:27:02] the intuition side of adverse early examples so let me let me write down [00:27:04] examples so let me let me write down something so one question we asked [00:27:08] something so one question we asked ourselves is why do adversity an example [00:27:10] ourselves is why do adversity an example exists [00:27:11] exists what's the reason and young good fellow [00:27:14] what's the reason and young good fellow and and and his team have came up with [00:27:17] and and and his team have came up with explaining with the the one of the [00:27:19] explaining with the the one of the seminal papers of adversity examples [00:27:21] seminal papers of adversity examples where they argue that although many [00:27:24] where they argue that although many people in the past have have attributed [00:27:27] people in the past have have attributed these existence of adversity examples to [00:27:30] these existence of adversity examples to high nonlinear non-linearity zuv neural [00:27:32] high nonlinear non-linearity zuv neural networks and overfitting [00:27:34] networks and overfitting so because we over [00:27:35] so because we over it to a specific data set we actually [00:27:37] it to a specific data set we actually don't understand what cats are we just [00:27:40] don't understand what cats are we just understanding what what we've been [00:27:42] understanding what what we've been trained on they argue that it's actually [00:27:45] trained on they argue that it's actually the linear parts of networks that is the [00:27:47] the linear parts of networks that is the cause of the existence of adversary [00:27:49] cause of the existence of adversary examples so let's see why and the [00:27:52] examples so let's see why and the example I'm going to I'm going to look [00:27:53] example I'm going to I'm going to look at is linear regression so together with [00:27:58] at is linear regression so together with similar gistic regression linear [00:28:00] similar gistic regression linear regression is basically the same thing [00:28:02] regression is basically the same thing without the sigmoid so before the [00:28:03] without the sigmoid so before the sigmoid we have Y hat equals W X plus B [00:28:06] sigmoid we have Y hat equals W X plus B so the for propagation of our network is [00:28:12] so the for propagation of our network is going to be Y hat equals W X plus B okay [00:28:18] going to be Y hat equals W X plus B okay and our first example is going to be a [00:28:22] and our first example is going to be a six dimensional input okay we have a [00:28:33] six dimensional input okay we have a neuron here but the neuron doesn't have [00:28:36] neuron here but the neuron doesn't have any activation because we're in linear [00:28:38] any activation because we're in linear regression so here what happens is [00:28:40] regression so here what happens is simply w8 plus B okay and then we get Y [00:28:46] simply w8 plus B okay and then we get Y hats and we probably use an l1 or l2 [00:28:50] hats and we probably use an l1 or l2 loss because it's a regression problem [00:28:52] loss because it's a regression problem to to train this network now let's look [00:28:59] to to train this network now let's look at the first example a first example [00:29:02] at the first example a first example where where X where we strained our [00:29:06] where where X where we strained our network so network has been trained so [00:29:12] network so network has been trained so network has been trained and converged [00:29:24] to W equals 1/3 minus 1 2 to 3 this is w [00:29:38] to W equals 1/3 minus 1 2 to 3 this is w and you know like because we defined X [00:29:41] and you know like because we defined X to be a vector of size is a column [00:29:45] to be a vector of size is a column vector W has to be a row vector of size [00:29:48] vector W has to be a row vector of size 6 [00:29:50] 6 so the network converts to this value of [00:29:53] so the network converts to this value of W and B equals zero so now we're going [00:29:58] W and B equals zero so now we're going to look at this input we're giving a new [00:30:00] to look at this input we're giving a new input to the network and then the input [00:30:05] input to the network and then the input is going to be 1 minus 1 to 0 3 minus 2 [00:30:12] is going to be 1 minus 1 to 0 3 minus 2 ok so I'm going to 4 propagate this to [00:30:17] ok so I'm going to 4 propagate this to get Y hat equals W X plus B and this [00:30:25] get Y hat equals W X plus B and this value is going to be 1 times 1 minus 3 [00:30:29] value is going to be 1 times 1 minus 3 minus 2 plus 0 plus 6 minus 6 if I [00:30:39] minus 2 plus 0 plus 6 minus 6 if I didn't make a mistake [00:30:40] didn't make a mistake up up 2 minus 3 [00:30:46] up up 2 minus 3 okay and so we we basically get minus 4 [00:30:54] ok so this is the the first the first [00:30:59] ok so this is the the first the first example that was propagated now the [00:31:05] example that was propagated now the question is how to change X into X star [00:31:22] such that Y hat changes radically but X [00:31:39] such that Y hat changes radically but X star is close to X so this is basically [00:31:45] star is close to X so this is basically our problem but we're still examples can [00:31:47] our problem but we're still examples can we find an example that is very close to [00:31:49] we find an example that is very close to X but radically radically changes the [00:31:52] X but radically radically changes the output of our network and we're trying [00:31:56] output of our network and we're trying to build intuition on adversarial [00:31:58] to build intuition on adversarial networks [00:31:58] networks so the interesting part is to is to [00:32:02] so the interesting part is to is to identify how we should modify X and the [00:32:08] identify how we should modify X and the intuition comes from the derivative if [00:32:10] intuition comes from the derivative if you take the derivative of Y hat with [00:32:14] you take the derivative of Y hat with respect to X you know that the [00:32:17] respect to X you know that the definition of this term is is like [00:32:22] definition of this term is is like correlated to the impact on Y hat of [00:32:27] correlated to the impact on Y hat of small changes of X right how what's the [00:32:36] small changes of X right how what's the impact of small changes of X to on the [00:32:39] impact of small changes of X to on the output and if you compute it what do you [00:32:46] output and if you compute it what do you get W everybody agrees what's the shape [00:32:56] get W everybody agrees what's the shape of this thing shape of that is the same [00:33:02] of this thing shape of that is the same as shape of X so it should be W [00:33:07] as shape of X so it should be W transpose remember derivative of a [00:33:11] transpose remember derivative of a scalar with respect to a vector is the [00:33:14] scalar with respect to a vector is the shape of the vector okay now it's [00:33:18] shape of the vector okay now it's interesting to to see this because if we [00:33:21] interesting to to see this because if we compute X star to be let's say X plus a [00:33:27] compute X star to be let's say X plus a small perturbation like I will call it [00:33:31] small perturbation like I will call it perturbation value yeah sorry and can [00:33:40] perturbation value yeah sorry and can you see the top one listen yes or no [00:33:48] so what if X star equals x plus epsilon [00:33:52] so what if X star equals x plus epsilon time w transpose you know and this [00:33:55] time w transpose you know and this epsilon I will call it value of the [00:33:58] epsilon I will call it value of the perturbation now if we for propagate X [00:34:04] perturbation now if we for propagate X star it means we do y hat star equals W [00:34:11] star it means we do y hat star equals W X star plus B but B is 0 at this point [00:34:16] X star plus B but B is 0 at this point we're going to get W X plus epsilon W [00:34:23] we're going to get W X plus epsilon W times W transpose and W times W [00:34:31] times W transpose and W times W transpose is a dot product right so this [00:34:36] transpose is a dot product right so this is the same as W squared so what is [00:34:45] is the same as W squared so what is interesting it's interesting because the [00:34:49] interesting it's interesting because the smart part was that this term is always [00:34:51] smart part was that this term is always going to be positive it means we removed [00:34:55] going to be positive it means we removed a little bit X because we can make this [00:34:57] a little bit X because we can make this change little by changing epsilon to a [00:34:59] change little by changing epsilon to a small value but it's going to push Y hat [00:35:03] small value but it's going to push Y hat to a larger value for sure you know and [00:35:07] to a larger value for sure you know and if I had a minus here instead of a plus [00:35:09] if I had a minus here instead of a plus it would push Y hat to a smaller value [00:35:11] it would push Y hat to a smaller value and the interesting thing is now if we [00:35:15] and the interesting thing is now if we compute X star to be X plus epsilon [00:35:22] compute X star to be X plus epsilon times W transpose and we take epsilon to [00:35:26] times W transpose and we take epsilon to be a small value like let's say point [00:35:28] be a small value like let's say point two you can make the calculation what we [00:35:35] two you can make the calculation what we get is is this so 1 minus 1/2 0 3 minus [00:35:43] get is is this so 1 minus 1/2 0 3 minus 2 plus 0.2 times 1 0.2 times 3 minus 0.2 [00:35:53] 2 plus 0.2 times 1 0.2 times 3 minus 0.2 plus 0.4 plus 0.4 and [00:35:59] plus 0.4 plus 0.4 and zero point six so if you look at that [00:36:06] zero point six so if you look at that all the positive values have been pushed [00:36:09] all the positive values have been pushed on the right degree and all the negative [00:36:16] on the right degree and all the negative values sorry sorry no that's my way [00:36:19] values sorry sorry no that's my way no that's sorry so let's finish the [00:36:21] no that's sorry so let's finish the calculation and I'll give the inside [00:36:22] calculation and I'll give the inside after one point to a minus zero point [00:36:26] after one point to a minus zero point four one point eight zero point four [00:36:32] four one point eight zero point four three point four and minus one point [00:36:35] three point four and minus one point four so this is our X star that we hope [00:36:38] four so this is our X star that we hope to be adversarial [00:36:39] to be adversarial okay let's compute y hat star to see [00:36:43] okay let's compute y hat star to see what happens it's W X star plus B which [00:36:48] what happens it's W X star plus B which is zero so what we get when we multiply [00:36:50] is zero so what we get when we multiply W by X star is 1.2 1.2 minus one point [00:37:10] W by X star is 1.2 1.2 minus one point two minus one point eight plus zero [00:37:15] two minus one point eight plus zero point eight plus six point eight and [00:37:21] point eight plus six point eight and minus 4.2 which I believe is going to [00:37:32] minus 4.2 which I believe is going to give us zero point five okay so we see [00:37:40] give us zero point five okay so we see that a very slight change in X star has [00:37:44] that a very slight change in X star has pushed Y hat from minus 4 to point 5 and [00:37:49] pushed Y hat from minus 4 to point 5 and so a few things we want to notice here [00:37:59] so insights on this on this small [00:38:01] so insights on this on this small example the first one is that if W is [00:38:09] example the first one is that if W is large then X star is not similar to X [00:38:22] large then X star is not similar to X right the larger the W the less X star [00:38:27] right the larger the W the less X star is is likely to be like X and [00:38:28] is is likely to be like X and specifically if one entry of the a value [00:38:31] specifically if one entry of the a value is very large X I the pixel [00:38:35] is very large X I the pixel corresponding to this entry is going to [00:38:36] corresponding to this entry is going to be very different from X I star if W is [00:38:42] be very different from X I star if W is large X star is going to be different [00:38:44] large X star is going to be different than X so what we're going to do is that [00:38:47] than X so what we're going to do is that we're going to take sine sine of W [00:38:55] we're going to take sine sine of W instead of taking W what's the reason [00:38:58] instead of taking W what's the reason why we do that because the interesting [00:38:59] why we do that because the interesting part is the sine of of the W it means if [00:39:03] part is the sine of of the W it means if we play correctly with the sign of W we [00:39:07] we play correctly with the sign of W we will always push the X this term W X [00:39:12] will always push the X this term W X star in the positive side because every [00:39:16] star in the positive side because every entry here this multiplication is going [00:39:18] entry here this multiplication is going to give us a positive number right and [00:39:23] the second insight is that as X grows in [00:39:31] the second insight is that as X grows in dimension the impact of plus Epsilon [00:39:44] dimension the impact of plus Epsilon sign of W increases that makes sense [00:40:00] so the impact of sign of W on white hats [00:40:05] so the impact of sign of W on white hats increases and so what's interesting to [00:40:10] increases and so what's interesting to notice is that we can keep epsilon as [00:40:13] notice is that we can keep epsilon as small as possible it means X and X star [00:40:15] small as possible it means X and X star will be very similar but as we grow in [00:40:18] will be very similar but as we grow in dimension we're going to get more term [00:40:20] dimension we're going to get more term in this a lot more term and the change [00:40:23] in this a lot more term and the change in Y hat is going to grow and grow and [00:40:25] in Y hat is going to grow and grow and grow and grow and grow and so the one [00:40:27] grow and grow and grow and so the one reason why adversity all examples exist [00:40:29] reason why adversity all examples exist for images is because the dimension is [00:40:32] for images is because the dimension is very high 64 by 64 by 3 so we can make [00:40:36] very high 64 by 64 by 3 so we can make epsilon very small and take the sign of [00:40:39] epsilon very small and take the sign of W we will still get Y hat to be far from [00:40:44] W we will still get Y hat to be far from the original value that it had does it [00:40:46] the original value that it had does it make sense do you guys have any question [00:40:49] make sense do you guys have any question on that so epsilon doesn't grow with the [00:40:53] on that so epsilon doesn't grow with the dimension but its impact of this term [00:40:56] dimension but its impact of this term increases with the dimension okay to a [00:41:27] increases with the dimension okay to a cooler I think who's watching - I know - [00:41:30] cooler I think who's watching - I know - included I think what's what into what [00:41:36] but wait in between these gives you a [00:41:40] but wait in between these gives you a map it to another can't eat this okay so [00:41:45] map it to another can't eat this okay so you like you try to earn adversarially [00:41:48] you like you try to earn adversarially yeah I I don't know if that's has been [00:41:51] yeah I I don't know if that's has been done I don't think that has been done so [00:41:52] done I don't think that has been done so you're talking about taking a little [00:41:54] you're talking about taking a little coder that takes the adverse an example [00:41:55] coder that takes the adverse an example to convert you to a normal image of the [00:41:57] to convert you to a normal image of the cat and then give the cat maybe yeah I [00:42:00] cat and then give the cat maybe yeah I don't know so it's a topic of research [00:42:03] don't know so it's a topic of research okay let's move on because we don't have [00:42:05] okay let's move on because we don't have too much time so just to conclude what [00:42:08] too much time so just to conclude what we're going to count as a general way to [00:42:11] we're going to count as a general way to generate adversely examples is this [00:42:13] generate adversely examples is this formula this is going to be a fast way [00:42:26] formula this is going to be a fast way to generate adversary example so this [00:42:29] to generate adversary example so this method is called the phase fast gradient [00:42:34] sign method so basically what we're [00:42:40] sign method so basically what we're doing is that we can we're linearizing [00:42:42] doing is that we can we're linearizing the cost function in in the proximity of [00:42:46] the cost function in in the proximity of the parameters and we're saying that [00:42:49] the parameters and we're saying that what applied to linear networks here is [00:42:52] what applied to linear networks here is going to also apply for this general [00:42:54] going to also apply for this general formula for deeper networks so we're [00:42:57] formula for deeper networks so we're pushing the pixel images in one [00:42:59] pushing the pixel images in one direction that is going to impact highly [00:43:02] direction that is going to impact highly the output okay so that's the intuition [00:43:05] the output okay so that's the intuition behind it now you might say that okay we [00:43:08] behind it now you might say that okay we did this example on a linear network but [00:43:10] did this example on a linear network but neural networks are not linear they're [00:43:11] neural networks are not linear they're highly nonlinear in fact if you look [00:43:14] highly nonlinear in fact if you look where the research has been going for [00:43:16] where the research has been going for the past few years we're trying to [00:43:18] the past few years we're trying to linearize all the behaviors of these [00:43:20] linearize all the behaviors of these neural networks with value for example [00:43:22] neural networks with value for example or width of your initialization all that [00:43:25] or width of your initialization all that type of methods even a sigmoid when we [00:43:27] type of methods even a sigmoid when we train on sigmoid we do all we can to put [00:43:29] train on sigmoid we do all we can to put sigmoid in the linear regime because we [00:43:32] sigmoid in the linear regime because we want fast training okay and one last [00:43:36] want fast training okay and one last thing that I'll mention for adversary' [00:43:38] thing that I'll mention for adversary' examples [00:43:40] examples is if I have a network like this so [00:43:53] is if I have a network like this so fully connected with three-dimensional [00:43:56] fully connected with three-dimensional inputs up yeah and then one here and [00:44:03] inputs up yeah and then one here and then the output what's interesting is [00:44:06] then the output what's interesting is computing the chain rule on this neuron [00:44:09] computing the chain rule on this neuron will give you that derivative of the [00:44:12] will give you that derivative of the loss function with respect to let's say [00:44:16] loss function with respect to let's say X is equal to the derivative of the loss [00:44:20] X is equal to the derivative of the loss function with respect to Z 1 1 here [00:44:27] function with respect to Z 1 1 here times derivative of Z 1 1 with respect [00:44:33] times derivative of Z 1 1 with respect to X let's see where we're going we're [00:44:36] to X let's see where we're going we're going there's actually a summation here [00:44:38] going there's actually a summation here but anyway just let me illustrate the [00:44:41] but anyway just let me illustrate the point what we're what we're saying is [00:44:44] point what we're what we're saying is that what we're what we try to do with [00:44:46] that what we're what we try to do with neural networks is to have these [00:44:48] neural networks is to have these gradients be high because if this [00:44:53] gradients be high because if this gradient is not high we're not able to [00:44:55] gradient is not high we're not able to train the parameters of this neuron and [00:44:57] train the parameters of this neuron and we need this gradient to be high because [00:44:59] we need this gradient to be high because if you want to do the same thing with [00:45:01] if you want to do the same thing with the we W 1 1 which is the parameters [00:45:05] the we W 1 1 which is the parameters related to this neuron you would need to [00:45:08] related to this neuron you would need to go to this Traynham correct so we need [00:45:11] go to this Traynham correct so we need this gradient to be high and if this [00:45:13] this gradient to be high and if this gradient is high the gradient with [00:45:15] gradient is high the gradient with respect to the input is also going to be [00:45:16] respect to the input is also going to be high because you use the same gradient [00:45:19] high because you use the same gradient in the chain rule so networks that are [00:45:22] in the chain rule so networks that are that have high gradients and that are [00:45:25] that have high gradients and that are operating in the linear regime or even [00:45:27] operating in the linear regime or even more vulnerable to adverse real examples [00:45:30] more vulnerable to adverse real examples because of this observation so any [00:45:35] because of this observation so any question on adversarial examples before [00:45:39] question on adversarial examples before we move on I think we don't have time [00:45:40] we move on I think we don't have time and I would like to to go over the gans [00:45:43] and I would like to to go over the gans with you guys so let's move on to guns [00:45:46] with you guys so let's move on to guns I'll stick around to answer questions on [00:45:48] I'll stick around to answer questions on that part so the general question we're [00:45:51] that part so the general question we're asking now is do neural networks [00:45:53] asking now is do neural networks understand [00:45:54] understand the data because we've seen that some [00:45:57] the data because we've seen that some some data points look like there would [00:46:00] some data points look like there would be real but the neural networks don't [00:46:03] be real but the neural networks don't understand it so more generally can we [00:46:06] understand it so more generally can we build generate these networks that can [00:46:07] build generate these networks that can mimic the real world distribution of [00:46:10] mimic the real world distribution of images let's say and this is what we [00:46:13] images let's say and this is what we will call generative address all at work [00:46:15] will call generative address all at work we'll start by motivating it and then we [00:46:17] we'll start by motivating it and then we look at something called the minimax [00:46:19] look at something called the minimax game between two networks a generator [00:46:20] game between two networks a generator and a discriminator that are going to [00:46:22] and a discriminator that are going to help each other improve and finally we [00:46:25] help each other improve and finally we will see that gans are hard to train [00:46:29] will see that gans are hard to train we'll see some tips to train them and [00:46:31] we'll see some tips to train them and finally go over some nice results and [00:46:34] finally go over some nice results and methods to evaluate ganz ok so the [00:46:41] methods to evaluate ganz ok so the motivation behind generative iverson [00:46:43] motivation behind generative iverson networks is to endow computers with an [00:46:45] networks is to endow computers with an understanding of our world ok so by by [00:46:50] understanding of our world ok so by by that we mean that we want to collect a [00:46:52] that we mean that we want to collect a lot of data use it to train a model that [00:46:54] lot of data use it to train a model that can generate images that look like [00:46:56] can generate images that look like they're real even if they're not so a [00:46:58] they're real even if they're not so a dog that has never existed can be [00:47:00] dog that has never existed can be generated by this network and finally [00:47:04] generated by this network and finally the number of parameters of the model is [00:47:07] the number of parameters of the model is smaller than the amount of data we [00:47:09] smaller than the amount of data we already talked about that and this is [00:47:11] already talked about that and this is the intuition behind why a generative [00:47:13] the intuition behind why a generative network can exist is because there is [00:47:16] network can exist is because there is too much data in the world any images [00:47:18] too much data in the world any images count as the data for generative network [00:47:20] count as the data for generative network and there are not enough parameters to [00:47:22] and there are not enough parameters to mimic this data you know you have the [00:47:25] mimic this data you know you have the network needs to understand the salient [00:47:28] network needs to understand the salient features of the data set because it [00:47:30] features of the data set because it doesn't have enough parameter to overfit [00:47:32] doesn't have enough parameter to overfit everything so let's talk about [00:47:35] everything so let's talk about probability distributions so these are [00:47:37] probability distributions so these are samples from real images that have been [00:47:39] samples from real images that have been taken and if you plot this real data [00:47:42] taken and if you plot this real data distribution in a 2d map it would look [00:47:46] distribution in a 2d map it would look like something like that I made it up [00:47:48] like something like that I made it up but this is the image space similar to [00:47:51] but this is the image space similar to what we talked about in adverts or [00:47:52] what we talked about in adverts or networks and this green shape is the [00:47:55] networks and this green shape is the space of real-world images now if you [00:48:00] space of real-world images now if you train a generator and generate some [00:48:02] train a generator and generate some images that look like this and these [00:48:04] images that look like this and these images come from Stagg an from John [00:48:09] images come from Stagg an from John this distribution if the generator is [00:48:12] this distribution if the generator is not good is not going to match the real [00:48:14] not good is not going to match the real world distribution so our goal here is [00:48:16] world distribution so our goal here is to do something so that the red [00:48:19] to do something so that the red distribution matches the real-world [00:48:21] distribution matches the real-world distribution going to train the network [00:48:23] distribution going to train the network so that it realizes what we want so this [00:48:28] so that it realizes what we want so this is our generator and it's what counts it [00:48:30] is our generator and it's what counts it what what we want to train ultimately we [00:48:33] what what we want to train ultimately we want to give it let's say a random [00:48:35] want to give it let's say a random number or a random latent code of 100 [00:48:38] number or a random latent code of 100 dimension scalar numbers and we want it [00:48:41] dimension scalar numbers and we want it to output an image but of course because [00:48:44] to output an image but of course because it's not trained initially it's going to [00:48:46] it's not trained initially it's going to output a random image looks like [00:48:48] output a random image looks like something like that [00:48:49] something like that random pixels now this image doesn't [00:48:54] random pixels now this image doesn't look very good [00:48:55] look very good what we want is these images to look [00:48:58] what we want is these images to look like generated images that are very [00:49:00] like generated images that are very similar to the real world so how are we [00:49:02] similar to the real world so how are we going to help this generator train it's [00:49:05] going to help this generator train it's not like what we did in classic [00:49:07] not like what we did in classic supervised learning because we don't [00:49:09] supervised learning because we don't have we don't really have inputs and [00:49:12] have we don't really have inputs and labels you know there is no label we [00:49:14] labels you know there is no label we could maybe give it an image of a cat [00:49:16] could maybe give it an image of a cat and ask it to output another cat [00:49:21] and ask it to output another cat but we want the network to be able to [00:49:23] but we want the network to be able to output things that don't exist things [00:49:25] output things that don't exist things that we've never seen right so we want [00:49:27] that we've never seen right so we want the network to understand what a cat is [00:49:29] the network to understand what a cat is but not overfit to the cat we give it so [00:49:33] but not overfit to the cat we give it so the way we're going to do it is through [00:49:35] the way we're going to do it is through a small game between this network called [00:49:38] a small game between this network called the generator G and another network [00:49:40] the generator G and another network called the discriminator D let's let's [00:49:44] called the discriminator D let's let's look at how it works we have a database [00:49:47] look at how it works we have a database of real images and we're going to start [00:49:53] of real images and we're going to start with this distribution on the bottom [00:49:54] with this distribution on the bottom which is the real world data [00:49:56] which is the real world data distribution is the distribution of the [00:49:57] distribution is the distribution of the images in this database [00:49:59] images in this database now our generator has this distribution [00:50:02] now our generator has this distribution initially it means the pixels that you [00:50:04] initially it means the pixels that you see here probably follow a distribution [00:50:06] see here probably follow a distribution that doesn't match the real world will [00:50:09] that doesn't match the real world will define a discriminator D and the goal of [00:50:11] define a discriminator D and the goal of the discriminator will be to detect if [00:50:16] the discriminator will be to detect if an image is real or not so we're going [00:50:19] an image is real or not so we're going to give several images to discuss to [00:50:20] to give several images to discuss to measure some [00:50:21] measure some times we will give it generated images [00:50:23] times we will give it generated images and sometimes we will give it real-world [00:50:26] and sometimes we will give it real-world images what we want is that this [00:50:28] images what we want is that this discriminator is a binary classifier [00:50:30] discriminator is a binary classifier that outputs one if the image is real [00:50:36] that outputs one if the image is real and zero if the image was generated okay [00:50:40] and zero if the image was generated okay so let's say we give it X coming from [00:50:43] so let's say we give it X coming from the generated image it's going to give [00:50:46] the generated image it's going to give us zero because we want the [00:50:48] us zero because we want the discriminator to detect that X was [00:50:51] discriminator to detect that X was actually G of Z if the image came from [00:50:56] actually G of Z if the image came from our database of real images we want the [00:50:58] our database of real images we want the discriminator to say one so it seems [00:51:02] discriminator to say one so it seems like the discriminator would be easy to [00:51:03] like the discriminator would be easy to train right it's just a binary [00:51:04] train right it's just a binary classification we can define a loss [00:51:06] classification we can define a loss function that is the binary [00:51:07] function that is the binary cross-entropy and the good thing is we [00:51:10] cross-entropy and the good thing is we can have as many label as we want like [00:51:13] can have as many label as we want like it's it's unsupervised but a little bit [00:51:15] it's it's unsupervised but a little bit supervised you know we have this [00:51:16] supervised you know we have this database and we label it all as one it's [00:51:20] database and we label it all as one it's just this image exists let's label them [00:51:22] just this image exists let's label them as one for this creator and everything [00:51:24] as one for this creator and everything that comes out of the generator let's [00:51:25] that comes out of the generator let's label it as zero for the discriminator [00:51:27] label it as zero for the discriminator so basically data is not costly at all [00:51:29] so basically data is not costly at all in this point the way we will train is [00:51:34] in this point the way we will train is that we will back propagate the gradient [00:51:35] that we will back propagate the gradient to the discriminator to train the [00:51:37] to the discriminator to train the discriminator using a binary croissant [00:51:39] discriminator using a binary croissant roughly but what we ultimately want is [00:51:41] roughly but what we ultimately want is to train the generator that's what we [00:51:44] to train the generator that's what we want at the end we're not going to use [00:51:45] want at the end we're not going to use the discriminator we just want to [00:51:47] the discriminator we just want to generate images so we're going to direct [00:51:49] generate images so we're going to direct the gradient to go back to the generator [00:51:50] the gradient to go back to the generator and why does this gradient go back to [00:51:54] and why does this gradient go back to the generator the reason is that X is G [00:51:59] the generator the reason is that X is G of Z it means we can back propagate the [00:52:02] of Z it means we can back propagate the gradient all the way back to the input [00:52:04] gradient all the way back to the input of the discriminator but this input [00:52:06] of the discriminator but this input depends on the input of the generator if [00:52:09] depends on the input of the generator if the image was generated so we can also [00:52:11] the image was generated so we can also back propagate and direct the gradient [00:52:12] back propagate and direct the gradient to the generator does it make sense [00:52:15] to the generator does it make sense there is a direct relation between Z and [00:52:17] there is a direct relation between Z and the last function in the case where the [00:52:20] the last function in the case where the image was generated if the image was [00:52:22] image was generated if the image was real then the generator couldn't get [00:52:25] real then the generator couldn't get degraded because X doesn't depend on Z [00:52:27] degraded because X doesn't depend on Z or on the features and parameters of the [00:52:30] or on the features and parameters of the generator ok so we would run an [00:52:34] generator ok so we would run an algorithm such as ad [00:52:37] algorithm such as ad simultaneously on two many matches one [00:52:39] simultaneously on two many matches one for the true data and from forms [00:52:41] for the true data and from forms generated data does this scheme make [00:52:45] generated data does this scheme make sense to everyone [00:52:46] sense to everyone yeah one question so there's many method [00:52:55] yeah one question so there's many method of doing your question is about mixing [00:52:57] of doing your question is about mixing them in batches usually we would use we [00:52:59] them in batches usually we would use we would use one mini batch for the real [00:53:01] would use one mini batch for the real data and one mini batch for the fake [00:53:03] data and one mini batch for the fake data but if in practice you can try [00:53:06] data but if in practice you can try other things so there are many methods [00:53:09] other things so there are many methods that are being tried to train Gans [00:53:12] that are being tried to train Gans properly we're going to delve a little [00:53:14] properly we're going to delve a little more into the details of that when we [00:53:15] more into the details of that when we will see the loss functions so we hope [00:53:19] will see the loss functions so we hope that the probability distributions will [00:53:21] that the probability distributions will match at the end and if it matches we're [00:53:23] match at the end and if it matches we're going to just take the generator and [00:53:25] going to just take the generator and generate images normally it should be [00:53:27] generate images normally it should be able to generate images that look real [00:53:29] able to generate images that look real that look like they came from this [00:53:31] that look like they came from this distribution okay sounds good so now [00:53:36] distribution okay sounds good so now let's talk more about the training [00:53:38] let's talk more about the training procedure and try to figure out what the [00:53:39] procedure and try to figure out what the loss function should be in this case [00:53:44] what should be the cost of the [00:53:46] what should be the cost of the discriminator [00:53:51] assuming assuming we give too many [00:53:53] assuming assuming we give too many batches one for real data so real images [00:53:57] batches one for real data so real images and one for generated data that come [00:53:59] and one for generated data that come from G yes the same basic the same basic [00:54:11] from G yes the same basic the same basic loss function we use from binary class [00:54:13] loss function we use from binary class for binary classifiers [00:54:14] for binary classifiers it's true we're going to tweak it a tiny [00:54:16] it's true we're going to tweak it a tiny bit but it's the same idea so this is [00:54:18] bit but it's the same idea so this is what it can look like we're going to [00:54:20] what it can look like we're going to call it JD cost function of the [00:54:22] call it JD cost function of the discriminator it has two terms what does [00:54:25] discriminator it has two terms what does the first term say what does the second [00:54:27] the first term say what does the second term say and you can recognize the [00:54:31] term say and you can recognize the binary croissant trophy here the only [00:54:34] binary croissant trophy here the only difference is that we have a able that [00:54:37] difference is that we have a able that is why real and a label that is why [00:54:39] is why real and a label that is why generated in practice why real and why [00:54:42] generated in practice why real and why generated are always going to be set to [00:54:44] generated are always going to be set to values we know that Y generated is zero [00:54:46] values we know that Y generated is zero and we know that Y real is one so we can [00:54:49] and we know that Y real is one so we can just remove these two terms because [00:54:50] just remove these two terms because they're both equal to 1 the first term [00:54:53] they're both equal to 1 the first term is telling us these should correctly [00:54:56] is telling us these should correctly label real data as one - croissant [00:54:58] label real data as one - croissant repeater the first term of a binary [00:55:01] repeater the first term of a binary cross-entropy the second term is going [00:55:04] cross-entropy the second term is going to tell us this should correctly label [00:55:06] to tell us this should correctly label generated data 0 so the difference with [00:55:09] generated data 0 so the difference with classic croissant roba we've seen is [00:55:10] classic croissant roba we've seen is that this summation is the summation [00:55:12] that this summation is the summation over the real mini batch and the [00:55:15] over the real mini batch and the summation on the 2nd cross entropy is [00:55:17] summation on the 2nd cross entropy is the summation and generated mini batch [00:55:18] the summation and generated mini batch that make sense so we both want D to [00:55:24] that make sense so we both want D to correctly identify the real data and [00:55:28] correctly identify the real data and also correctly identified fake data [00:55:30] also correctly identified fake data that's why we have two terms now what [00:55:34] that's why we have two terms now what about the generator what do you think [00:55:36] about the generator what do you think should be the cost function of the [00:55:37] should be the cost function of the generator yes if I can put it either [00:55:43] generator yes if I can put it either that's from the generator I want to run [00:55:46] that's from the generator I want to run the first half [00:55:47] the first half because I don't have any Wi-Fi and [00:55:50] because I don't have any Wi-Fi and inputs coming into generator yeah [00:55:53] inputs coming into generator yeah exactly [00:55:53] exactly yes but in your batch you will have had [00:55:56] yes but in your batch you will have had like a certain number of real example of [00:55:57] like a certain number of real example of certain dimmer of generating examples [00:55:59] certain dimmer of generating examples the generated examples have no impact on [00:56:01] the generated examples have no impact on the first cross entre P and same for the [00:56:03] the first cross entre P and same for the real examples on the second course on [00:56:05] real examples on the second course on true any other questions [00:56:14] okay so coming back to the cross to the [00:56:16] okay so coming back to the cross to the to the cost of the generator what should [00:56:20] to the cost of the generator what should it be this is a tiny bit complicated [00:56:25] it be this is a tiny bit complicated let's move let's move on because we [00:56:27] let's move let's move on because we don't have too much time the cost of the [00:56:29] don't have too much time the cost of the generator basically should say that G [00:56:32] generator basically should say that G should try to swing it the goal is to [00:56:35] should try to swing it the goal is to forge it to generate real samples and in [00:56:38] forge it to generate real samples and in order to generate real samples we want [00:56:40] order to generate real samples we want to fool D if J managed to fool D and D [00:56:44] to fool D if J managed to fool D and D is very good it means G is very good [00:56:46] is very good it means G is very good right the problem is that it's a game [00:56:50] right the problem is that it's a game because if D is bad and G fools D it [00:56:54] because if D is bad and G fools D it doesn't mean that G is good because G [00:56:57] doesn't mean that G is good because G because D is bad it doesn't detect very [00:56:59] because D is bad it doesn't detect very well the real versus fake examples we [00:57:01] well the real versus fake examples we want D to go up to be very good and G to [00:57:04] want D to go up to be very good and G to go up at the same time until the [00:57:06] go up at the same time until the equilibrium is reached at a certain [00:57:08] equilibrium is reached at a certain point where D will always output one [00:57:11] point where D will always output one half like random probabilities because [00:57:12] half like random probabilities because it cannot distinguish the samples coming [00:57:15] it cannot distinguish the samples coming from G versus the real samples so this [00:57:18] from G versus the real samples so this cost function is basically saying for [00:57:22] cost function is basically saying for generated images we want to classify [00:57:25] generated images we want to classify them as one okay so you know like if [00:57:59] them as one okay so you know like if you're using so how to implement that if [00:58:01] you're using so how to implement that if you're using a different framework [00:58:02] you're using a different framework you've been building a graph right and [00:58:05] you've been building a graph right and at the end of your graph you've been [00:58:07] at the end of your graph you've been building your cost functions D that is [00:58:10] building your cost functions D that is very close to a binary cross-entropy [00:58:12] very close to a binary cross-entropy what you're going to just do is to [00:58:15] what you're going to just do is to define a node that is going to be minus [00:58:16] define a node that is going to be minus the cost function of D it's going every [00:58:20] the cost function of D it's going every time you're going to call the function J [00:58:23] time you're going to call the function J of G is going to run the graph [00:58:27] of G is going to run the graph that you define for JFD and run a an [00:58:30] that you define for JFD and run a an opposition operation an opposite of [00:58:32] opposition operation an opposite of operation same way propagate gradients [00:58:43] operation same way propagate gradients back the same way we're not going to [00:58:45] back the same way we're not going to propagate the same way we're going to [00:58:47] propagate the same way we're going to turn into a minus sign for the grade for [00:58:50] turn into a minus sign for the grade for the generator so you know you you back [00:58:53] the generator so you know you you back propagate on the on the on D and when [00:58:56] propagate on the on the on D and when you back propagate on G you would flip [00:58:57] you back propagate on G you would flip you would flip the sign that's all we do [00:59:01] you would flip the sign that's all we do the same thing with the sign fleet terms [00:59:03] the same thing with the sign fleet terms of implementation is just another [00:59:05] of implementation is just another operation okay now let's look at [00:59:08] operation okay now let's look at something interesting is that this Lord [00:59:12] something interesting is that this Lord logarithm let's look at the graph of the [00:59:17] logarithm let's look at the graph of the logarithm so I'm going to plot against [00:59:27] logarithm so I'm going to plot against the abscess axe G sorry D of G of Z so [00:59:32] the abscess axe G sorry D of G of Z so what does this mean this axis is the [00:59:35] what does this mean this axis is the output of D when given a generated [00:59:38] output of D when given a generated example G of Z is going to be between 0 [00:59:42] example G of Z is going to be between 0 and 1 because it's a probability D is a [00:59:46] and 1 because it's a probability D is a binary classifier with a sigmoid our [00:59:48] binary classifier with a sigmoid our output probably if we plot logarithm of [00:59:52] output probably if we plot logarithm of X so like this type of thing this would [00:59:58] X so like this type of thing this would be log of the of G of Z does it make [01:00:04] be log of the of G of Z does it make sense is the logarithm function if I [01:00:08] sense is the logarithm function if I plot minus that minus that so let me let [01:00:15] plot minus that minus that so let me let me plot minus logarithm of G of G Ozzy [01:00:18] me plot minus logarithm of G of G Ozzy or or let me let me do something else [01:00:20] or or let me let me do something else let me plot logarithm of minus D of G of [01:00:28] let me plot logarithm of minus D of G of Z [01:00:32] this is it do you guys agree now what [01:00:36] this is it do you guys agree now what I'm going to do is that I'm going to [01:00:37] I'm going to do is that I'm going to plot another function that is this one [01:00:42] plot another function that is this one that is logarithm of one minus D of G of [01:00:49] that is logarithm of one minus D of G of Z okay so the question is right now what [01:00:59] Z okay so the question is right now what we're doing is that we're saying the [01:01:01] we're doing is that we're saying the cost function of the generator is [01:01:04] cost function of the generator is logarithm of one minus D of G of Z so it [01:01:08] logarithm of one minus D of G of Z so it looks like this right it looks like this [01:01:12] looks like this right it looks like this one what's the issue with this one what [01:01:17] one what's the issue with this one what do you think is the issue with this cost [01:01:19] do you think is the issue with this cost function looking at it like that sorry [01:01:29] function looking at it like that sorry can you say louder it goes to negative [01:01:34] can you say louder it goes to negative infinity in one that's what you mean [01:01:38] infinity in one that's what you mean yeah yeah and so the consequence of that [01:01:40] yeah yeah and so the consequence of that is that the gradient here is going to be [01:01:44] is that the gradient here is going to be very large the closer we go to one but [01:01:48] very large the closer we go to one but the closer we are to zero the lower is [01:01:50] the closer we are to zero the lower is the gradient and is the reverse [01:01:52] the gradient and is the reverse phenomenon for this logarithm the [01:01:55] phenomenon for this logarithm the gradient is very high and very high I [01:01:57] gradient is very high and very high I mean in absolute value a very high when [01:02:00] mean in absolute value a very high when we're close to zero but it's very low [01:02:02] we're close to zero but it's very low when we go close to one okay so which [01:02:07] when we go close to one okay so which loss function you think would be better [01:02:08] loss function you think would be better a loss function that looks like this one [01:02:11] a loss function that looks like this one or a loss function that looks like this [01:02:12] or a loss function that looks like this one [01:02:16] to train our generator [01:02:23] the broader question is where are we [01:02:26] the broader question is where are we early in the training are we close to [01:02:28] early in the training are we close to here or always close to death what does [01:02:32] here or always close to death what does it mean to be close they're two to one [01:02:36] you're fooling the network it means that [01:02:39] you're fooling the network it means that the kinks that generated samples or real [01:02:43] the kinks that generated samples or real you're here this place is the contrary [01:02:47] you're here this place is the contrary he thinks that generated samples are [01:02:50] he thinks that generated samples are fake [01:02:51] fake it means correctly finds out that [01:02:54] it means correctly finds out that they're fake early on we're generally [01:02:56] they're fake early on we're generally here because the discriminator is better [01:02:59] here because the discriminator is better than the generator generator output [01:03:01] than the generator generator output garbage at the beginning and it's very [01:03:03] garbage at the beginning and it's very easy for the discriminator to figure out [01:03:05] easy for the discriminator to figure out that it's fake because this garbage [01:03:06] that it's fake because this garbage looks very different from real-world [01:03:07] looks very different from real-world data so early on we're here [01:03:10] data so early on we're here so which function is the best one to to [01:03:12] so which function is the best one to to to to be our cost yeah so probably this [01:03:18] to to be our cost yeah so probably this one is better so we have to use a [01:03:20] one is better so we have to use a mathematical trick to change this into [01:03:23] mathematical trick to change this into that right and the mathematical trick is [01:03:26] that right and the mathematical trick is pretty standard right now we're [01:03:28] pretty standard right now we're minimizing something that is in log of 1 [01:03:31] minimizing something that is in log of 1 minus X we can say that doing so is the [01:03:37] minus X we can say that doing so is the same as maximizing something that is in [01:03:41] same as maximizing something that is in log of X near a simple flip min max flip [01:03:46] log of X near a simple flip min max flip and we can also say that it's the same [01:03:48] and we can also say that it's the same as minimizing something in minus log of [01:03:52] as minimizing something in minus log of X does it make sense so we're going to [01:03:57] X does it make sense so we're going to use this mathematical trick to convert [01:03:59] use this mathematical trick to convert our function that is a saturating cost [01:04:02] our function that is a saturating cost we would say into an on-stage rating [01:04:04] we would say into an on-stage rating class that is going to look more like [01:04:05] class that is going to look more like this let's see what it looks like so to [01:04:11] this let's see what it looks like so to sum up our cost function currently looks [01:04:14] sum up our cost function currently looks like that [01:04:14] like that it's a saturating cost because early on [01:04:17] it's a saturating cost because early on the gradients are small we cannot train [01:04:20] the gradients are small we cannot train G we're going to do a flip that I just [01:04:24] G we're going to do a flip that I just talked about on the board and convert [01:04:26] talked about on the board and convert this into another function that is a non [01:04:29] this into another function that is a non saturating cost ok what you yeah [01:04:34] saturating cost ok what you yeah so the reason it's the blue one is like [01:04:36] so the reason it's the blue one is like that is because I added a minus sign [01:04:38] that is because I added a minus sign here so I'm flipping this okay and it's [01:04:43] here so I'm flipping this okay and it's the same thing it's just the sign of the [01:04:45] the same thing it's just the sign of the gradient that is going to be different [01:04:46] gradient that is going to be different like that the gradient is high at the [01:04:50] like that the gradient is high at the beginning and low at the end that makes [01:04:53] beginning and low at the end that makes sense so we're going to do the use this [01:04:57] sense so we're going to do the use this flip and so we have a new training [01:04:59] flip and so we have a new training processor now where J of D didn't change [01:05:01] processor now where J of D didn't change but J of G changed we have a minus sign [01:05:04] but J of G changed we have a minus sign here and instead of the log of 1 minus D [01:05:07] here and instead of the log of 1 minus D of G of Z we have the log of G of G of Z [01:05:10] of G of Z we have the log of G of G of Z does that make sense to everyone cool [01:05:14] does that make sense to everyone cool and actually so this is a fun thing if [01:05:17] and actually so this is a fun thing if you should check this paper which is [01:05:18] you should check this paper which is really cool Oregon's created equal it's [01:05:20] really cool Oregon's created equal it's a large study of many many different [01:05:23] a large study of many many different guns [01:05:24] guns it shows what people have tried and you [01:05:26] it shows what people have tried and you can see that people have tried all types [01:05:28] can see that people have tried all types of laws to make guns trainable so it [01:05:31] of laws to make guns trainable so it looks it looks complicated here but [01:05:33] looks it looks complicated here but actually the mm gun is the first one we [01:05:36] actually the mm gun is the first one we saw together is the minimax lost [01:05:38] saw together is the minimax lost function the second one is the non [01:05:40] function the second one is the non saturating one that we just see so you [01:05:42] saturating one that we just see so you see between the first two the only [01:05:44] see between the first two the only difference is that on the generator we [01:05:46] difference is that on the generator we get the log of 1 minus D of X hat [01:05:49] get the log of 1 minus D of X hat becoming law of minus log of D of X I [01:05:54] becoming law of minus log of D of X I okay now another trick to train guns is [01:05:58] okay now another trick to train guns is to use the fact that a non saturating to [01:06:02] to use the fact that a non saturating to use the fact that D is usually easier to [01:06:05] use the fact that D is usually easier to train than G but as the improved G can [01:06:11] train than G but as the improved G can improve if D doesn't improve G cannot [01:06:13] improve if D doesn't improve G cannot improve so you can see the the [01:06:17] improve so you can see the the performance of D has an upper bound to [01:06:19] performance of D has an upper bound to what G can achieve because of that we [01:06:23] what G can achieve because of that we will usually train D more time than we [01:06:25] will usually train D more time than we will train G so we will basically train [01:06:28] will train G so we will basically train for nomination K times D 1 time G K [01:06:33] for nomination K times D 1 time G K times D 1 times e and so on so that the [01:06:36] times D 1 times e and so on so that the discriminator becomes better then the [01:06:37] discriminator becomes better then the generator can catch up better then can [01:06:40] generator can catch up better then can catch up and so on that make sense [01:06:42] catch up and so on that make sense there's also methods to use like [01:06:44] there's also methods to use like different learning rates for D ng [01:06:47] different learning rates for D ng to take this into account to train [01:06:48] to take this into account to train faster the discriminator okay because we [01:06:53] faster the discriminator okay because we don't have too much time I'm going to [01:06:54] don't have too much time I'm going to skip the bathroom Wiggins we're going to [01:06:56] skip the bathroom Wiggins we're going to sit probably next week together after [01:06:58] sit probably next week together after you guys have seen the bathroom videos [01:07:02] you guys have seen the bathroom videos okay great school so just to sum up some [01:07:06] okay great school so just to sum up some some tips to Train ganz is to modify the [01:07:09] some tips to Train ganz is to modify the cost function we've seen one [01:07:10] cost function we've seen one modification there are many more keeping [01:07:13] modification there are many more keeping D up to date with respect to G so [01:07:15] D up to date with respect to G so updating D more than you update g using [01:07:18] updating D more than you update g using virtual match norm which is a derivate [01:07:19] virtual match norm which is a derivate of batch norm so it's a different type [01:07:22] of batch norm so it's a different type of action or is used here and something [01:07:24] of action or is used here and something called one-sided lai label smoothing [01:07:27] called one-sided lai label smoothing that i'm not going to talk about it [01:07:28] that i'm not going to talk about it today because we don't have time so [01:07:30] today because we don't have time so let's see some nice result now and [01:07:33] let's see some nice result now and that's the funnest part so some of you [01:07:37] that's the funnest part so some of you have worked with word embeddings and you [01:07:39] have worked with word embeddings and you you might know that we're done weddings [01:07:41] you might know that we're done weddings are vectors that can encode the meaning [01:07:43] are vectors that can encode the meaning of the word and you can compute [01:07:45] of the word and you can compute operation sometimes on this on these [01:07:48] operation sometimes on this on these words so if you take if you take king - [01:07:51] words so if you take if you take king - quinn it should be equal to mine - woman [01:07:54] quinn it should be equal to mine - woman operations like that that happen in the [01:07:57] operations like that that happen in the space of encoding so here's the same you [01:08:00] space of encoding so here's the same you can use a generator to generate faces [01:08:02] can use a generator to generate faces and the paper is listed on the bottom [01:08:04] and the paper is listed on the bottom here so you give a code that is a random [01:08:07] here so you give a code that is a random code and it will give you an image of a [01:08:09] code and it will give you an image of a face you can give it a second code it's [01:08:12] face you can give it a second code it's going to give you a second image that is [01:08:14] going to give you a second image that is different from the first one because the [01:08:15] different from the first one because the code was different you can give it a [01:08:17] code was different you can give it a third one it's going to give you a third [01:08:19] third one it's going to give you a third interface the fun part is if you take [01:08:23] interface the fun part is if you take code 1 - code 2 plus code 3 so basically [01:08:27] code 1 - code 2 plus code 3 so basically image of a man with glasses - image of a [01:08:30] image of a man with glasses - image of a man plus image of the women will give [01:08:33] man plus image of the women will give you an image of a woman with glasses [01:08:37] so this is interesting because it means [01:08:39] so this is interesting because it means that linear operation in the latent [01:08:42] that linear operation in the latent space of codes have impact directly on [01:08:45] space of codes have impact directly on the image space okay let's look at [01:08:49] the image space okay let's look at something even better so you can use [01:08:53] something even better so you can use guns for image generation of course [01:08:54] guns for image generation of course these are very nice samples you see that [01:08:57] these are very nice samples you see that sometimes guns have problem with with oh [01:09:02] sometimes guns have problem with with oh no I don't think that's the dog but but [01:09:05] no I don't think that's the dog but but but these are stag and plus vs. is a [01:09:08] but these are stag and plus vs. is a very impressive gun that has generated [01:09:10] very impressive gun that has generated that has been state of the art for a [01:09:11] that has been state of the art for a long time okay so let's see something [01:09:16] long time okay so let's see something fun something called image to image [01:09:18] fun something called image to image translation so actually the the project [01:09:22] translation so actually the the project winners last quarter in spring was a [01:09:24] winners last quarter in spring was a project dealing with exactly that [01:09:26] project dealing with exactly that generating satellite images based on the [01:09:29] generating satellite images based on the map image so given a map image generate [01:09:32] map image so given a map image generate the satellite image using a gun so you [01:09:34] the satellite image using a gun so you see that instead of giving a latent code [01:09:35] see that instead of giving a latent code that was 100 dimensional you could give [01:09:37] that was 100 dimensional you could give a very detailed code the code can be [01:09:39] a very detailed code the code can be this image right and you have to find a [01:09:43] this image right and you have to find a way to constrain your network in a [01:09:44] way to constrain your network in a certain width in a certain way to push [01:09:47] certain width in a certain way to push it to output exactly the satellite image [01:09:50] it to output exactly the satellite image that corresponds it to this map image [01:09:52] that corresponds it to this map image there are many other results that are [01:09:54] there are many other results that are found converting zebras to horses to [01:09:56] found converting zebras to horses to zebras and zebras to horses and apples [01:10:00] zebras and zebras to horses and apples to oranges and oranges to Apple so let's [01:10:03] to oranges and oranges to Apple so let's do a case study together let's say our [01:10:06] do a case study together let's say our goal is to convert horses to zebras on [01:10:08] goal is to convert horses to zebras on images and vice versa can you tell me [01:10:12] images and vice versa can you tell me what data we need let's go quickly so [01:10:14] what data we need let's go quickly so that we have some time yeah horses and [01:10:18] that we have some time yeah horses and zebras do you need her images you know [01:10:20] zebras do you need her images you know like you need to have the same image of [01:10:23] like you need to have the same image of your horse as a zebra yeah so the [01:10:26] your horse as a zebra yeah so the problem is okay we could have labels [01:10:29] problem is okay we could have labels images you know like a horse and it's [01:10:33] images you know like a horse and it's zebra doppelganger in the same position [01:10:35] zebra doppelganger in the same position and we could train a network to take one [01:10:38] and we could train a network to take one and out with the other unfortunately we [01:10:40] and out with the other unfortunately we don't not every horse has a doppelganger [01:10:42] don't not every horse has a doppelganger that is a zebra so we cannot do that so [01:10:45] that is a zebra so we cannot do that so instead we're going to do unpaired [01:10:47] instead we're going to do unpaired unpaired generative address or networks [01:10:50] unpaired generative address or networks it means we have a database of horses [01:10:52] it means we have a database of horses and a database of zebras but these are [01:10:55] and a database of zebras but these are different horses and different zebras [01:10:56] different horses and different zebras they're not one-to-one there's no one to [01:10:58] they're not one-to-one there's no one to one mapping between them there's no [01:11:00] one mapping between them there's no mapping at all what architecture do you [01:11:02] mapping at all what architecture do you want to use nice not really ok so let's [01:11:16] want to use nice not really ok so let's see about the architecture and the cost [01:11:17] see about the architecture and the cost so I'm gonna go over it quickly because [01:11:20] so I'm gonna go over it quickly because it's a it's a very fun gun with its [01:11:22] it's a it's a very fun gun with its called cycle gun so the way we're going [01:11:25] called cycle gun so the way we're going to work it out is we have a horse called [01:11:27] to work it out is we have a horse called capital H we want to generate the zebra [01:11:31] capital H we want to generate the zebra version of this horse right so we give [01:11:32] version of this horse right so we give it to a generator that we call g1 [01:11:34] it to a generator that we call g1 you can call it h to Z like horse to [01:11:37] you can call it h to Z like horse to zebra it should give us this horse H as [01:11:40] zebra it should give us this horse H as a zebra right and in fact if we're [01:11:44] a zebra right and in fact if we're training again we need a discriminator [01:11:45] training again we need a discriminator so we will add a discriminator that is [01:11:47] so we will add a discriminator that is going to be a binary classifier to tell [01:11:50] going to be a binary classifier to tell us if this image outputted by generator [01:11:53] us if this image outputted by generator 1 is real or not [01:11:54] 1 is real or not so this discriminator is going to take [01:11:57] so this discriminator is going to take in some images of zebras probably or yes [01:12:03] in some images of zebras probably or yes zebras or horses and it's going to also [01:12:05] zebras or horses and it's going to also take the generated images I'm going to [01:12:09] take the generated images I'm going to see which one is fake which one is real [01:12:11] see which one is fake which one is real on the other hand we're going to do and [01:12:14] on the other hand we're going to do and the vice-versa is very important we need [01:12:16] the vice-versa is very important we need to enforce the fact that this horse G 1 [01:12:20] to enforce the fact that this horse G 1 of H should be the same horse as H in [01:12:25] of H should be the same horse as H in order to do that we're going to create [01:12:26] order to do that we're going to create another generator which is going to take [01:12:30] another generator which is going to take the generated image and generate back [01:12:32] the generated image and generate back the input image and this is where we [01:12:35] the input image and this is where we will be able to enforce the constraints [01:12:37] will be able to enforce the constraints that G 2 of G 1 of H should be equal to [01:12:40] that G 2 of G 1 of H should be equal to H do you see why this loop is super [01:12:43] H do you see why this loop is super important because if we don't have this [01:12:46] important because if we don't have this loop we don't have the constraints on [01:12:47] loop we don't have the constraints on the fact that the horse should be the [01:12:50] the fact that the horse should be the the zebra should be the horse as a zebra [01:12:53] the zebra should be the horse as a zebra the same horse as H so we'll do that and [01:12:56] the same horse as H so we'll do that and we have a second discriminator to decide [01:12:58] we have a second discriminator to decide if this image is real this is one step H [01:13:02] if this image is real this is one step H to Z [01:13:03] to Z another state might be z2h where we [01:13:05] another state might be z2h where we start with the zebra give it to [01:13:06] start with the zebra give it to generator to generate the horse version [01:13:08] generator to generate the horse version of the zebra discriminate generate back [01:13:12] of the zebra discriminate generate back the zebra version of the zebra and this [01:13:17] the zebra version of the zebra and this commit [01:13:17] commit does that make sense so this is the [01:13:20] does that make sense so this is the general pattern used in cycle Gans and [01:13:24] general pattern used in cycle Gans and what I'd like to go over is what loss [01:13:28] what I'd like to go over is what loss should we minimize in order to unforce [01:13:31] should we minimize in order to unforce the fact that we want the horse to be [01:13:33] the fact that we want the horse to be converted to a zebra that is the same as [01:13:35] converted to a zebra that is the same as the horse and someone gives me the terms [01:13:39] the horse and someone gives me the terms that we need someone wants to give it a [01:13:44] that we need someone wants to give it a try [01:13:49] go for two minutes [01:13:53] go for two minutes yes you want to make sure that the [01:13:56] yes you want to make sure that the picture in the end that is a zebra that [01:13:58] picture in the end that is a zebra that you sure talk with matches the secret [01:14:00] you sure talk with matches the secret that you started with or the horse - [01:14:01] that you started with or the horse - shirt off with matches of course that [01:14:03] shirt off with matches of course that you had immediately okay same time you [01:14:05] you had immediately okay same time you also need to have discriminator - [01:14:06] also need to have discriminator - identifying that the image is a real [01:14:09] identifying that the image is a real zebra or real horse yeah because you [01:14:12] zebra or real horse yeah because you don't want it to just sort of input in [01:14:13] don't want it to just sort of input in the sampled image and then output back [01:14:15] the sampled image and then output back to you the same okay so okay that's [01:14:27] to you the same okay so okay that's great so you're saying we need the [01:14:28] great so you're saying we need the classic cost functions that we've seen [01:14:30] classic cost functions that we've seen previously plus another one that is the [01:14:33] previously plus another one that is the matching between H and G - of G 1 of H [01:14:36] matching between H and G - of G 1 of H and Z and G 1 of g12c correct so we'll [01:14:40] and Z and G 1 of g12c correct so we'll have all these terms one term to Train [01:14:42] have all these terms one term to Train d1 which is the classic term we've seen [01:14:45] d1 which is the classic term we've seen differentiate real images from generated [01:14:48] differentiate real images from generated images g1 as well same we were using the [01:14:53] images g1 as well same we were using the non saturating cost on generate images [01:14:55] non saturating cost on generate images same for D - same for G - these are [01:14:58] same for D - same for G - these are classics the one we need to add to all [01:15:00] classics the one we need to add to all of this is the cycle cost which is the [01:15:04] of this is the cycle cost which is the distance between this term g2 of G 1 of [01:15:08] distance between this term g2 of G 1 of H and H and the same thing for zebras [01:15:11] H and H and the same thing for zebras does that make sense so you have the [01:15:14] does that make sense so you have the intuition to build that type of losses [01:15:15] intuition to build that type of losses we just sum everything and gives us the [01:15:17] we just sum everything and gives us the cost function we're looking for the same [01:15:26] cost function we're looking for the same cost function for d1 and d2 yeah so the [01:15:30] cost function for d1 and d2 yeah so the you could but it's not going to work [01:15:32] you could but it's not going to work that well I think so I think there is a [01:15:34] that well I think so I think there is a there's a tiny mistake here is that the [01:15:37] there's a tiny mistake here is that the zi here the small zi should be small H I [01:15:40] zi here the small zi should be small H I and the small H ion top should be a [01:15:44] and the small H ion top should be a small C eye because the discriminator [01:15:46] small C eye because the discriminator one is going to receive generated [01:15:48] one is going to receive generated samples that look like zebras because it [01:15:50] samples that look like zebras because it came out of g1 so you want the real [01:15:53] came out of g1 so you want the real database that you give it to to be [01:15:56] database that you give it to to be zebras as well to force to force the [01:15:59] zebras as well to force to force the generator want to output things that [01:16:00] generator want to output things that look like zebras and vice-versa for the [01:16:03] look like zebras and vice-versa for the second one okay [01:16:06] second one okay and this is my favorite so you can [01:16:09] and this is my favorite so you can convert the ROM into a face and back to [01:16:12] convert the ROM into a face and back to a ramen is the most fun application I [01:16:16] a ramen is the most fun application I found is from Naruto me at all [01:16:19] found is from Naruto me at all antic we attack oh so it's Japanese [01:16:22] antic we attack oh so it's Japanese research lab were working hard to do [01:16:26] research lab were working hard to do face to ramen yeah and actually in two [01:16:29] face to ramen yeah and actually in two in two to three weeks you will learn [01:16:31] in two to three weeks you will learn object detection you know to detect [01:16:33] object detection you know to detect faces and if you learn that maybe you [01:16:35] faces and if you learn that maybe you can start a project to like detect a [01:16:37] can start a project to like detect a face and then replace it by a ramen [01:16:39] face and then replace it by a ramen because also funny funny work by Naruto [01:16:43] because also funny funny work by Naruto me okay oh this is a super cool [01:16:48] me okay oh this is a super cool application as well so let's look at [01:16:50] application as well so let's look at that okay so we have so this model is a [01:16:59] that okay so we have so this model is a conditional gun that was conditioned on [01:17:02] conditional gun that was conditioned on learning learning edges and generating [01:17:06] learning learning edges and generating cuts based on the edges so I'm gonna I'm [01:17:09] cuts based on the edges so I'm gonna I'm gonna try to draw a cat sorry I cannot [01:17:13] gonna try to draw a cat sorry I cannot see again I'm not a good driver [01:17:25] it's the cat okay he's going down with [01:17:29] it's the cat okay he's going down with the model I hope it's gonna work [01:17:43] okay I don't think it works but it's [01:17:48] okay I don't think it works but it's supposed to work [01:17:49] supposed to work so you can generate cats baits on on [01:17:51] so you can generate cats baits on on edges and you can do it for different [01:17:54] edges and you can do it for different things you can do it for a shoe so all [01:17:56] things you can do it for a shoe so all these model have been trained for that [01:17:59] these model have been trained for that okay [01:18:02] okay yes go for it [01:18:15] straighten your feet you have to train [01:18:21] straighten your feet you have to train it specifically for the domain so like [01:18:23] it specifically for the domain so like these models are different swing of the [01:18:25] these models are different swing of the 13 train okay looking for my [01:18:29] 13 train okay looking for my presentation I missed it the [01:18:36] presentation I missed it the presentation disappeared okay [01:18:39] presentation disappeared okay another application is super resolution [01:18:41] another application is super resolution you can give a lower resolution image [01:18:43] you can give a lower resolution image and generate the super resolution [01:18:44] and generate the super resolution version of it using guns and this is [01:18:47] version of it using guns and this is pretty cool because you can get a high [01:18:50] pretty cool because you can get a high resolution image downsample it and use [01:18:52] resolution image downsample it and use this as the minimax game you know like [01:18:56] this as the minimax game you know like you have the high resolution version of [01:18:59] you have the high resolution version of the lower very lower resolution image [01:19:02] the lower very lower resolution image other applications can be [01:19:04] other applications can be privacy-preserving so some people have [01:19:06] privacy-preserving so some people have been working on you know in medical in [01:19:10] been working on you know in medical in the medical space privacy is a huge [01:19:12] the medical space privacy is a huge issue you cannot share data set among [01:19:14] issue you cannot share data set among hospitals among medical teams is common [01:19:16] hospitals among medical teams is common so people have been looking at [01:19:18] so people have been looking at generating a data set that looks like a [01:19:21] generating a data set that looks like a medical data set if you train a model on [01:19:24] medical data set if you train a model on this data set is going to give you the [01:19:26] this data set is going to give you the same type of parameters than the other [01:19:27] same type of parameters than the other one but this data set is anonymized [01:19:30] one but this data set is anonymized so they can share the anonymized data [01:19:32] so they can share the anonymized data with each other [01:19:33] with each other and train their models and that without [01:19:36] and train their models and that without being able to access the information of [01:19:38] being able to access the information of the [01:19:38] the patient and who is manufacturing is [01:19:43] patient and who is manufacturing is important as well so Gans can generate [01:19:46] important as well so Gans can generate very specific objects that can replace [01:19:52] very specific objects that can replace bones for humans personalized - - to the [01:19:56] bones for humans personalized - - to the human body so same for dental if you [01:19:58] human body so same for dental if you lose the teeth the the technician can [01:20:01] lose the teeth the the technician can take a picture and decide what the the [01:20:03] take a picture and decide what the the crown should look like the gun can [01:20:06] crown should look like the gun can generate it another topic is how to [01:20:10] generate it another topic is how to evaluate guns you know you might say we [01:20:14] evaluate guns you know you might say we can just look at the images and see if [01:20:16] can just look at the images and see if they look real and it will give us an [01:20:18] they look real and it will give us an idea if the gun is working well in [01:20:20] idea if the gun is working well in practice it's hard because maybe the [01:20:21] practice it's hard because maybe the images you're looking at or overfitting [01:20:23] images you're looking at or overfitting images from the real samples you gave to [01:20:25] images from the real samples you gave to the to the to the discriminator so how [01:20:29] the to the to the discriminator so how do you check that it's very complicated [01:20:31] do you check that it's very complicated so human annotation is a big one where [01:20:34] so human annotation is a big one where you would you would build a software [01:20:37] you would you would build a software push it on the cloud and people around [01:20:39] push it on the cloud and people around the world are going to select which [01:20:41] the world are going to select which images look generated which images look [01:20:43] images look generated which images look not generated to see if a human can can [01:20:46] not generated to see if a human can can can compare your gun to real-world data [01:20:48] can compare your gun to real-world data and how your gun performs so it would [01:20:51] and how your gun performs so it would look like that a web app indicates which [01:20:54] look like that a web app indicates which image is fake which image is real you [01:20:56] image is fake which image is real you can you can do different experiments [01:20:57] can you can do different experiments like you can show very quickly an image [01:20:59] like you can show very quickly an image for a fraction of a second and ask them [01:21:02] for a fraction of a second and ask them was it real or not or you can give them [01:21:04] was it real or not or you can give them unlimited time different experiments can [01:21:06] unlimited time different experiments can be led there's another one that is more [01:21:09] be led there's another one that is more scalable because the human annotation is [01:21:11] scalable because the human annotation is very painful you know every time you [01:21:12] very painful you know every time you train again you want to do that to [01:21:14] train again you want to do that to verify if the gun is working well takes [01:21:16] verify if the gun is working well takes a lot of time so instead of using humans [01:21:18] a lot of time so instead of using humans why don't we use a very good network [01:21:20] why don't we use a very good network that is good at classification in fact [01:21:22] that is good at classification in fact in fact the inception network is a [01:21:24] in fact the inception network is a tremendous network that does [01:21:26] tremendous network that does classification we're going to give our [01:21:28] classification we're going to give our image samples to this inception network [01:21:31] image samples to this inception network and see what the network thinks of this [01:21:33] and see what the network thinks of this image does it think that it's a dog or [01:21:35] image does it think that it's a dog or not does it look like a dog for the [01:21:37] not does it look like a dog for the network or not and we can scale it and [01:21:38] network or not and we can scale it and make it very quick and there is a [01:21:40] make it very quick and there is a inception score that that we can talk [01:21:42] inception score that that we can talk next week about when we will have time [01:21:43] next week about when we will have time it measures the quality of the samples [01:21:47] it measures the quality of the samples and also it measures the diversity of [01:21:49] and also it measures the diversity of the sample I'll go over it next week [01:21:53] the sample I'll go over it next week there's another distance that is very [01:21:56] there's another distance that is very popular that has been growing ly popular [01:21:59] popular that has been growing ly popular recently called the fresh inception [01:22:00] recently called the fresh inception distance and I advise you to check some [01:22:04] distance and I advise you to check some of this paper to more interested in it [01:22:06] of this paper to more interested in it for for your projects so just to end for [01:22:10] for for your projects so just to end for next Wednesday we'll have C 2 & 3 and [01:22:13] next Wednesday we'll have C 2 & 3 and also the whole C 3 modules you'll have 3 [01:22:16] also the whole C 3 modules you'll have 3 quizzes be careful these two quiz C 3 M [01:22:19] quizzes be careful these two quiz C 3 M 1 and C 3 M 2 or longer than the normal [01:22:22] 1 and C 3 M 2 or longer than the normal quizzes there are like wild case studies [01:22:24] quizzes there are like wild case studies so take your time and go over it and [01:22:27] so take your time and go over it and you'll have 1 programming assignments [01:22:29] you'll have 1 programming assignments make sure you understand the batch norm [01:22:32] make sure you understand the batch norm videos so that we can go over the [01:22:33] videos so that we can go over the virtual batch norm hopefully next week [01:22:35] virtual batch norm hopefully next week together and hands-on section this [01:22:38] together and hands-on section this Friday you will receive your project [01:22:41] Friday you will receive your project proposal as soon as possible and meet [01:22:43] proposal as soon as possible and meet with your project TAS to go over the [01:22:45] with your project TAS to go over the proposal and to make decisions regarding [01:22:47] proposal and to make decisions regarding the next steps for your projects I'll [01:22:50] the next steps for your projects I'll stick around in case you have any [01:22:51] stick around in case you have any questions ok thanks guys ================================================================================ LECTURE 005 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 5 - AI + Healthcare Source: https://www.youtube.com/watch?v=IM9ANAbufYM --- Transcript [00:00:05] thanks for being here [00:00:07] thanks for being here five of [00:00:09] five of 2:30 today we have the chance to to host [00:00:14] 2:30 today we have the chance to to host a guest speaker Pranav reporter who is a [00:00:17] a guest speaker Pranav reporter who is a PhD student in computer science advised [00:00:21] PhD student in computer science advised by Professor Android and professor Percy [00:00:23] by Professor Android and professor Percy Liang so Pranab is is working on AI and [00:00:30] Liang so Pranab is is working on AI and high impact projects specifically [00:00:32] high impact projects specifically related to healthcare and natural [00:00:35] related to healthcare and natural language processing and today he is [00:00:37] language processing and today he is going to to present an overview of AI [00:00:40] going to to present an overview of AI for healthcare and he's going to dig [00:00:41] for healthcare and he's going to dig into some projects he has led through [00:00:44] into some projects he has led through case studies so don't hesitate to [00:00:47] case studies so don't hesitate to interact I think we have a lot to learn [00:00:48] interact I think we have a lot to learn from Pranav and he's really an industry [00:00:52] from Pranav and he's really an industry expert for AI for healthcare and I let [00:00:56] expert for AI for healthcare and I let you the mic run off thanks for being [00:00:58] you the mic run off thanks for being here thanks Karen thanks for inviting me [00:01:00] here thanks Karen thanks for inviting me can you hear me at the back is the mic [00:01:03] can you hear me at the back is the mic on alright fantastic well really glad to [00:01:06] on alright fantastic well really glad to be here so I want to cover three things [00:01:11] be here so I want to cover three things today [00:01:11] today the first is give you a sort of broad [00:01:13] the first is give you a sort of broad overview of what AI applications in [00:01:16] overview of what AI applications in healthcare look like the second is bring [00:01:20] healthcare look like the second is bring you three case studies from the lab that [00:01:22] you three case studies from the lab that I'm in as demonstrations of AI and [00:01:27] I'm in as demonstrations of AI and healthcare research and then finally [00:01:30] healthcare research and then finally some ways that you can get involved if [00:01:32] some ways that you can get involved if you're interested in applying AI to high [00:01:35] you're interested in applying AI to high impact problems in healthcare or if [00:01:37] impact problems in healthcare or if you're from a healthcare background as [00:01:38] you're from a healthcare background as well let's start with the first so one [00:01:45] well let's start with the first so one way we can decompose the kinds of things [00:01:48] way we can decompose the kinds of things I can do in healthcare is by trying to [00:01:51] I can do in healthcare is by trying to formulate levels of questions that we [00:01:53] formulate levels of questions that we can ask from data at the lowest level [00:01:57] can ask from data at the lowest level are what are descriptive questions here [00:02:00] are what are descriptive questions here we're really trying to get at what [00:02:02] we're really trying to get at what happened then there are diagnostic [00:02:06] happened then there are diagnostic questions where we're asking why did it [00:02:08] questions where we're asking why did it happen if a patient had chest pains [00:02:10] happen if a patient had chest pains I took their x-ray what is that chest [00:02:14] I took their x-ray what is that chest x-ray show if they have palpitations [00:02:17] x-ray show if they have palpitations what is their ECG show then their [00:02:20] what is their ECG show then their predictive problems [00:02:22] predictive problems sure I care about asking about the [00:02:25] sure I care about asking about the future what's going to happen in the [00:02:26] future what's going to happen in the next six months and then at the highest [00:02:29] next six months and then at the highest level our prescriptive problems here I'm [00:02:32] level our prescriptive problems here I'm really trying to ask okay I know this is [00:02:35] really trying to ask okay I know this is the patient this is the symptoms they're [00:02:37] the patient this is the symptoms they're coming in with this is how their [00:02:39] coming in with this is how their trajectory will look like in terms of in [00:02:43] trajectory will look like in terms of in terms of things that may happen that [00:02:46] terms of things that may happen that their risk off what should I do and this [00:02:48] their risk off what should I do and this is the real action point and that's I [00:02:51] is the real action point and that's I would say the the goldmine but to get [00:02:55] would say the the goldmine but to get there requires a lot of data and a lot [00:02:58] there requires a lot of data and a lot of steps and we'll talk a little bit [00:02:59] of steps and we'll talk a little bit more about that so in CS 2:30 you're all [00:03:06] more about that so in CS 2:30 you're all well aware of the paradigm shift of deep [00:03:11] well aware of the paradigm shift of deep learning and if we look at machine [00:03:14] learning and if we look at machine learning in healthcare literature we see [00:03:18] learning in healthcare literature we see that has a very similar pattern is that [00:03:21] that has a very similar pattern is that we had this feature extraction engineer [00:03:24] we had this feature extraction engineer who was responsible for getting from the [00:03:29] who was responsible for getting from the input to a set of features that a [00:03:31] input to a set of features that a classifier can understand and the deep [00:03:33] classifier can understand and the deep learning paradigm is to combine feature [00:03:35] learning paradigm is to combine feature extraction and the classification into [00:03:38] extraction and the classification into one step by automatically extracting [00:03:41] one step by automatically extracting features which is cool here's what I [00:03:43] features which is cool here's what I think will be the next paradigm shift [00:03:46] think will be the next paradigm shift for AI in healthcare but also more [00:03:49] for AI in healthcare but also more generally is we still have a deep [00:03:53] generally is we still have a deep learning engineer up here that's you [00:03:55] learning engineer up here that's you that's me that are designing the [00:03:58] that's me that are designing the networks that are making decisions like [00:03:59] networks that are making decisions like a convolutional neural network is the [00:04:01] a convolutional neural network is the best architecture for this problem the [00:04:04] best architecture for this problem the specific type of architecture there's an [00:04:06] specific type of architecture there's an RN and CN n and whatever NN you can [00:04:09] RN and CN n and whatever NN you can throw on there but what if we could just [00:04:12] throw on there but what if we could just replace out the ml engineer as well and [00:04:17] replace out the ml engineer as well and I find this quite funny because everyone [00:04:19] I find this quite funny because everyone you know in AI for healthcare question [00:04:21] you know in AI for healthcare question that I get asked a lot is are we going [00:04:23] that I get asked a lot is are we going to replace doctors with all these AI [00:04:25] to replace doctors with all these AI solutions and nobody actually realizes [00:04:28] solutions and nobody actually realizes that we might replace machine learning [00:04:30] that we might replace machine learning engineers faster than we might replace [00:04:33] engineers faster than we might replace doctors of this earth this to be the [00:04:35] doctors of this earth this to be the case [00:04:36] case a lot of research is developing [00:04:38] a lot of research is developing algorithms that can automatically learn [00:04:40] algorithms that can automatically learn architecture some of what you might go [00:04:42] architecture some of what you might go through in this class great so that's [00:04:46] through in this class great so that's the general overview now I want to talk [00:04:48] the general overview now I want to talk about three case studies in the lab of [00:04:51] about three case studies in the lab of AI being applied to different problems [00:04:54] AI being applied to different problems and because healthcare is so broad I [00:04:56] and because healthcare is so broad I thought I'd focus in on one narrow [00:04:59] thought I'd focus in on one narrow vertical and let us go deep on that and [00:05:02] vertical and let us go deep on that and that's medical imaging so I've chosen [00:05:05] that's medical imaging so I've chosen three problems and one of them's a 1b [00:05:10] three problems and one of them's a 1b problem the second is a 2d problem as [00:05:13] problem the second is a 2d problem as and the third is a-- is it 3d problem so [00:05:16] and the third is a-- is it 3d problem so I thought we could we can walk through [00:05:17] I thought we could we can walk through all the different kinds of data here so [00:05:22] all the different kinds of data here so this is some work that was done early [00:05:24] this is some work that was done early last year in the lab where we showed [00:05:27] last year in the lab where we showed that we were able to detect arrhythmias [00:05:29] that we were able to detect arrhythmias at the level of cardiologists so [00:05:34] at the level of cardiologists so arrhythmias are an important problem [00:05:35] arrhythmias are an important problem affect millions of people this has [00:05:38] affect millions of people this has especially come to light recently with [00:05:40] especially come to light recently with devices like the Apple watch which now [00:05:43] devices like the Apple watch which now have a ECG monitoring and the thing [00:05:49] have a ECG monitoring and the thing about this is that sometimes you might [00:05:51] about this is that sometimes you might have symptoms and know that you have [00:05:53] have symptoms and know that you have arrhythmias but other times you may not [00:05:56] arrhythmias but other times you may not have symptoms and still have arrhythmias [00:05:59] have symptoms and still have arrhythmias that can be addressed with with if if [00:06:03] that can be addressed with with if if you were to do an ECG and the ECGs test [00:06:07] you were to do an ECG and the ECGs test is basically showing the heart's [00:06:08] is basically showing the heart's electrical activity over time the [00:06:10] electrical activity over time the electrodes are attached the skin-safe [00:06:13] electrodes are attached the skin-safe tests and it takes over a few minutes [00:06:15] tests and it takes over a few minutes and this is what it looks like when [00:06:17] and this is what it looks like when you're hooked up to all the different [00:06:18] you're hooked up to all the different electrodes so this test is often done [00:06:22] electrodes so this test is often done for a few minutes in the hospital and [00:06:25] for a few minutes in the hospital and the finding is basically that in a few [00:06:28] the finding is basically that in a few minutes you can't really capture a [00:06:30] minutes you can't really capture a person's of normal heart rhythms so [00:06:33] person's of normal heart rhythms so let's send them home for 24 to 48 hours [00:06:36] let's send them home for 24 to 48 hours with a holter monitor and let's see what [00:06:38] with a holter monitor and let's see what we can find there are more recent [00:06:41] we can find there are more recent devices such as the Zeo patch which let [00:06:44] devices such as the Zeo patch which let let patients be monitored for up to two [00:06:47] let patients be monitored for up to two weeks and it's it's quite [00:06:49] weeks and it's it's quite convenient you can use it in the shower [00:06:51] convenient you can use it in the shower or while you're sleeping so you really [00:06:53] or while you're sleeping so you really can capture a lot of what what's [00:06:56] can capture a lot of what what's happening in the hearts ECG activity but [00:07:02] happening in the hearts ECG activity but if we look at the amount of data that's [00:07:04] if we look at the amount of data that's generated in two weeks it's one point [00:07:06] generated in two weeks it's one point six million heartbeats that's a lot and [00:07:09] six million heartbeats that's a lot and there are very few doctors who'd be [00:07:12] there are very few doctors who'd be willing to go through two weeks of ECG [00:07:14] willing to go through two weeks of ECG reading for each of their patients and [00:07:16] reading for each of their patients and this really motivates why we need [00:07:18] this really motivates why we need automated interpretation here but [00:07:22] automated interpretation here but automated detection comes with a [00:07:24] automated detection comes with a challenges one of them is you know you [00:07:28] challenges one of them is you know you have in the hospital several electrodes [00:07:30] have in the hospital several electrodes and in more recent devices we have just [00:07:33] and in more recent devices we have just one and the way one can think of several [00:07:37] one and the way one can think of several electrodes is sort of the electrical [00:07:40] electrodes is sort of the electrical activity of the heart is 3d and each one [00:07:44] activity of the heart is 3d and each one of the electrodes is giving a different [00:07:45] of the electrodes is giving a different 2d perspective into the 3d perspective [00:07:49] 2d perspective into the 3d perspective but now that we have only one lead we [00:07:52] but now that we have only one lead we only have one of these perspectives [00:07:53] only have one of these perspectives available and the second one is that the [00:07:57] available and the second one is that the differences between the heart rhythms [00:07:58] differences between the heart rhythms are very subtle this is what a cardiac [00:08:02] are very subtle this is what a cardiac cycle looks like and when we're looking [00:08:04] cycle looks like and when we're looking at arrhythmias or normal heart rhythms [00:08:08] at arrhythmias or normal heart rhythms one's going to look at the sub [00:08:11] one's going to look at the sub structures within the cycle and then [00:08:14] structures within the cycle and then this the structure between cycles as [00:08:17] this the structure between cycles as well and the differences are quite [00:08:20] well and the differences are quite subtle so when we started working on [00:08:26] subtle so when we started working on this problem oh maybe I should share [00:08:29] this problem oh maybe I should share this story so we started working on this [00:08:31] this story so we started working on this problem and then it was me my my [00:08:34] problem and then it was me my my collaborator on e and and professor Inge [00:08:37] collaborator on e and and professor Inge and one of the things that he that he [00:08:40] and one of the things that he that he mentioned we should do he said let's [00:08:42] mentioned we should do he said let's just go out and read ECG books and let's [00:08:44] just go out and read ECG books and let's do the exercises and if you're in med [00:08:46] do the exercises and if you're in med school they're these books where where [00:08:49] school they're these books where where you can where you can learn about ECG [00:08:51] you can where you can learn about ECG interpretation and then there are [00:08:53] interpretation and then there are several exercises that you can do to [00:08:55] several exercises that you can do to test yourselves so I went to the med [00:08:57] test yourselves so I went to the med school library you know they have those [00:09:00] school library you know they have those hand crank [00:09:02] hand crank shouts at the bottom see if to move them [00:09:04] shouts at the bottom see if to move them and then grab my book and then we went [00:09:06] and then grab my book and then we went for two weeks and did learn sir did go [00:09:10] for two weeks and did learn sir did go through two books and learn ECG [00:09:12] through two books and learn ECG interpretation and it was pretty [00:09:13] interpretation and it was pretty challenging and if we looked at previous [00:09:18] challenging and if we looked at previous literature to this I think they were [00:09:20] literature to this I think they were sort of drawing upon some domain [00:09:23] sort of drawing upon some domain knowledge er in that here we're looking [00:09:25] knowledge er in that here we're looking at waves how can we extract specific [00:09:27] at waves how can we extract specific features from waves that doctors are [00:09:30] features from waves that doctors are also looking at so there was a lot of [00:09:32] also looking at so there was a lot of feature engineering going on and if [00:09:34] feature engineering going on and if you're familiar with wavelet transforms [00:09:36] you're familiar with wavelet transforms they were this sort of they were the [00:09:39] they were this sort of they were the most common approach with a lot of sort [00:09:43] most common approach with a lot of sort of like different mother wavelets etc [00:09:45] of like different mother wavelets etc etc pre-processing bandpass filters so [00:09:48] etc pre-processing bandpass filters so everything you can imagine doing what [00:09:50] everything you can imagine doing what signals was done and then you fed it [00:09:52] signals was done and then you fed it into your SVM and you called it a day [00:09:54] into your SVM and you called it a day now with deep learning we can change [00:09:57] now with deep learning we can change things up a bit [00:09:58] things up a bit so on the Left we have an ECG signal and [00:10:01] so on the Left we have an ECG signal and on the right is just three heart rhythms [00:10:05] on the right is just three heart rhythms we're gonna call them a B and C and [00:10:07] we're gonna call them a B and C and we're gonna learn the mapping to go [00:10:09] we're gonna learn the mapping to go straight from the input to the output [00:10:11] straight from the input to the output and here's how we're gonna break it out [00:10:15] and here's how we're gonna break it out we're gonna say that every label labels [00:10:19] we're gonna say that every label labels the same amount of the signal so if we [00:10:22] the same amount of the signal so if we had four labels and the ECG would be [00:10:25] had four labels and the ECG would be split into these four sort of this [00:10:28] split into these four sort of this rhythm is labeling this part and then [00:10:32] rhythm is labeling this part and then we're going to use a deep neural network [00:10:35] we're going to use a deep neural network so we've built a 1d convolutional neural [00:10:39] so we've built a 1d convolutional neural network which runs over the time [00:10:43] network which runs over the time dimension of the input because remember [00:10:44] dimension of the input because remember we're getting one scalar over over time [00:10:48] we're getting one scalar over over time and then this architecture is 34 layers [00:10:51] and then this architecture is 34 layers deep so I thought I'd talk a little bit [00:10:54] deep so I thought I'd talk a little bit about the architecture have you seen [00:10:57] about the architecture have you seen resonance before okay so should I go [00:11:03] resonance before okay so should I go into this okay cool here's my 1-minute [00:11:10] into this okay cool here's my 1-minute spiel of ResNet then is that your going [00:11:12] spiel of ResNet then is that your going deeper in terms of the number of layers [00:11:15] deeper in terms of the number of layers that your [00:11:16] that your having in a network you should be able [00:11:19] having in a network you should be able to represent a larger set of functions [00:11:22] to represent a larger set of functions but when we look at the training error [00:11:24] but when we look at the training error for these very deep networks what we [00:11:27] for these very deep networks what we find is that it's worse than a smaller [00:11:30] find is that it's worse than a smaller network now this is not the validation [00:11:32] network now this is not the validation error this is the training error that [00:11:34] error this is the training error that means even with the ability to represent [00:11:36] means even with the ability to represent a more complex function we aren't able [00:11:39] a more complex function we aren't able to represent the training data so the [00:11:44] to represent the training data so the motivating idea of residual networks is [00:11:47] motivating idea of residual networks is to say hey let's add shortcuts within [00:11:50] to say hey let's add shortcuts within network so as to minimize the distance [00:11:53] network so as to minimize the distance from the error signal to each of my [00:11:55] from the error signal to each of my layers this is just math to say the same [00:12:01] layers this is just math to say the same thing so further work on ResNet showed [00:12:05] thing so further work on ResNet showed that ok we have the shortcut connection [00:12:08] that ok we have the shortcut connection how should we make information flow [00:12:10] how should we make information flow through it the best and the finding was [00:12:15] through it the best and the finding was basically that anything you you add to [00:12:18] basically that anything you you add to the shortcut to the highway think of [00:12:20] the shortcut to the highway think of these as stop signs or or or signals on [00:12:25] these as stop signs or or or signals on a highway and it's basically saying the [00:12:27] a highway and it's basically saying the fastest way on the highways to not have [00:12:29] fastest way on the highways to not have anything but addition on it and then [00:12:35] anything but addition on it and then there were a few advancements on top of [00:12:38] there were a few advancements on top of that like adding dropout and increasing [00:12:42] that like adding dropout and increasing the number of filters in the [00:12:43] the number of filters in the convolutional neural network that we [00:12:47] convolutional neural network that we also added to this network okay so [00:12:50] also added to this network okay so that's the convolutional neural network [00:12:52] that's the convolutional neural network let's talk a little bit about data so [00:12:56] let's talk a little bit about data so one thing that was cool about this [00:12:57] one thing that was cool about this project was that we got to partner up [00:13:00] project was that we got to partner up with a with a startup that manufactures [00:13:05] with a with a startup that manufactures these hardware patches and we got data [00:13:08] these hardware patches and we got data off of patients who were wearing these [00:13:10] off of patients who were wearing these patches for up to two weeks and this was [00:13:15] patches for up to two weeks and this was from around 30,000 patients and this is [00:13:19] from around 30,000 patients and this is 600 times bigger than the largest data [00:13:21] 600 times bigger than the largest data set that that was out there before and [00:13:25] set that that was out there before and for each of these ECG signals what [00:13:28] for each of these ECG signals what happened [00:13:29] happened is that each of them is annotated by a [00:13:32] is that each of them is annotated by a clinical ECG expert who says here's [00:13:35] clinical ECG expert who says here's where rhythm a starts and here's where [00:13:37] where rhythm a starts and here's where ends so let's mark the whole ECG that [00:13:39] ends so let's mark the whole ECG that way obviously very time-intensive [00:13:41] way obviously very time-intensive but a good data source and then we had a [00:13:44] but a good data source and then we had a test set as well and here we use here we [00:13:48] test set as well and here we use here we use a committee of cardiologists so [00:13:51] use a committee of cardiologists so they'd get together sit in a room and [00:13:53] they'd get together sit in a room and decide ok we disagree on the specific [00:13:56] decide ok we disagree on the specific point let's try to let's try to discuss [00:13:58] point let's try to let's try to discuss which one of us is right or what this [00:14:00] which one of us is right or what this rhythm actually is so they arrive at a [00:14:02] rhythm actually is so they arrive at a ground truth after discussion and then [00:14:06] ground truth after discussion and then we can of course test cardiologists as [00:14:07] we can of course test cardiologists as well and the way we do this is we have [00:14:09] well and the way we do this is we have them do it individually so this is not [00:14:11] them do it individually so this is not the same set that did the ground truth [00:14:13] the same set that did the ground truth there's a different set of cardiologists [00:14:15] there's a different set of cardiologists coming in one at a time you tell me [00:14:17] coming in one at a time you tell me what's going on here and we're going to [00:14:18] what's going on here and we're going to test you so when we compared the [00:14:22] test you so when we compared the performance of our algorithm to [00:14:25] performance of our algorithm to cardiologists we found that we were able [00:14:28] cardiologists we found that we were able to surpass them on the f1 metrics so [00:14:32] to surpass them on the f1 metrics so this is precision and recall and when we [00:14:36] this is precision and recall and when we looked at where the mistakes were made [00:14:39] looked at where the mistakes were made we can see that sort of the the biggest [00:14:43] we can see that sort of the the biggest mistake was between distinguishing two [00:14:46] mistake was between distinguishing two rhythms which look very very similar but [00:14:49] rhythms which look very very similar but actually don't have a difference in in [00:14:52] actually don't have a difference in in in treatment here's another case where [00:14:55] in treatment here's another case where the model is not making a mistake which [00:14:57] the model is not making a mistake which the experts are making and turns out [00:15:00] the experts are making and turns out this is a costly mistake this is saying [00:15:02] this is a costly mistake this is saying a benign heart rhythm or what experts [00:15:06] a benign heart rhythm or what experts thought was a benign heart rhythm was [00:15:07] thought was a benign heart rhythm was actually a pretty serious one so that's [00:15:11] actually a pretty serious one so that's that's one beauty of automation is that [00:15:15] that's one beauty of automation is that we're able to catch these catch these [00:15:18] we're able to catch these catch these misdiagnosis [00:15:21] here are three hard blocks which are [00:15:25] here are three hard blocks which are clinically irrelevant to catch on which [00:15:26] clinically irrelevant to catch on which the model outperforms the experts and on [00:15:30] the model outperforms the experts and on atrial fibrillation which is probably [00:15:32] atrial fibrillation which is probably the most common serious arrhythmia the [00:15:34] the most common serious arrhythmia the same pulse [00:15:39] one of the things that's neat about this [00:15:42] one of the things that's neat about this application a lot of applications in [00:15:44] application a lot of applications in health care is what automation with deep [00:15:48] health care is what automation with deep learning machine learning enables is for [00:15:50] learning machine learning enables is for us to be able to continuously monitor [00:15:52] us to be able to continuously monitor patients and this is not something we've [00:15:54] patients and this is not something we've been able to do before so a lot of even [00:15:57] been able to do before so a lot of even science of understanding how patients [00:16:00] science of understanding how patients risk factors what they are or how they [00:16:04] risk factors what they are or how they change hasn't been done before and this [00:16:06] change hasn't been done before and this is an exciting opportunity to be able to [00:16:09] is an exciting opportunity to be able to advance science as well and the Apple [00:16:14] advance science as well and the Apple watch has recently released their their [00:16:18] watch has recently released their their ECG monitoring and it'll be exciting to [00:16:21] ECG monitoring and it'll be exciting to see what new things we can find out [00:16:24] see what new things we can find out about at the health of our hearts from [00:16:26] about at the health of our hearts from from these inventions okay so that was [00:16:31] from these inventions okay so that was our first question yeah so repeat the [00:16:42] our first question yeah so repeat the question how was it to to sort of deal [00:16:50] question how was it to to sort of deal with data privacy and sort of keep [00:16:51] with data privacy and sort of keep patients information private so in in [00:16:55] patients information private so in in this case we did not have we had [00:16:57] this case we did not have we had completely de-identified data so it was [00:16:59] completely de-identified data so it was just some was ECG signal without any [00:17:02] just some was ECG signal without any extra information about their their [00:17:05] extra information about their their clinical records or anything like that [00:17:07] clinical records or anything like that so it's it's very it's very you have [00:17:23] so it's it's very it's very you have like it like signed off that's credible [00:17:24] like it like signed off that's credible Authority you're like total hospitals [00:17:26] Authority you're like total hospitals I'm getting that to be oh sure and I [00:17:29] I'm getting that to be oh sure and I think we can we can take this question [00:17:30] think we can we can take this question offline as well but one of the beauties [00:17:32] offline as well but one of the beauties of working at Stanford is that there's a [00:17:35] of working at Stanford is that there's a lot of Industry research collaborations [00:17:37] lot of Industry research collaborations and we have great infrastructure to be [00:17:40] and we have great infrastructure to be able to work with that so which brings [00:17:44] able to work with that so which brings me on to my second case study sorry yeah [00:17:47] me on to my second case study sorry yeah go for it [00:18:05] that's a good question so just to repeat [00:18:08] that's a good question so just to repeat the question how did we define a gold [00:18:10] the question how did we define a gold standard when we have experts setting [00:18:13] standard when we have experts setting the gold standard so here's how we did [00:18:16] the gold standard so here's how we did it so one one way to come up with the [00:18:18] it so one one way to come up with the gold standard is to say okay if we [00:18:20] gold standard is to say okay if we looked at what a consensus would say [00:18:23] looked at what a consensus would say what would they say and so we got three [00:18:26] what would they say and so we got three cardiologists in a room to set the gold [00:18:29] cardiologists in a room to set the gold standard and then to compare the [00:18:31] standard and then to compare the performance of experts these were [00:18:33] performance of experts these were individuals who were separate from those [00:18:35] individuals who were separate from those groups of cardiologists who sat in [00:18:37] groups of cardiologists who sat in another room and said what they thought [00:18:39] another room and said what they thought of the of the ECG signals that way [00:18:43] of the of the ECG signals that way there's there's some disagreement where [00:18:46] there's there's some disagreement where the gold standard is set by the [00:18:48] the gold standard is set by the committee great so here we looked at how [00:18:56] committee great so here we looked at how we can detect pneumonia off of chest [00:19:00] we can detect pneumonia off of chest x-rays so pneumonia is an infection that [00:19:03] x-rays so pneumonia is an infection that affects millions in the u.s. it's big [00:19:08] affects millions in the u.s. it's big global burden is actually in in kids so [00:19:13] global burden is actually in in kids so that's where it's really useful to be [00:19:15] that's where it's really useful to be able to detect that automatically and [00:19:18] able to detect that automatically and well so to detect pneumonia there's a [00:19:22] well so to detect pneumonia there's a chest x-ray exam and chest x-rays are [00:19:27] chest x-ray exam and chest x-rays are the most common imaging procedure with [00:19:31] the most common imaging procedure with two billion chest x-rays done per year [00:19:33] two billion chest x-rays done per year and the way of normalities are detected [00:19:37] and the way of normalities are detected in chest x-rays is they present as areas [00:19:40] in chest x-rays is they present as areas of increased density so where things [00:19:44] of increased density so where things should appear dark they appear brighter [00:19:46] should appear dark they appear brighter or vice-versa [00:19:47] or vice-versa and here's what characteristically [00:19:52] and here's what characteristically pneumonia looks like where it's like a [00:19:54] pneumonia looks like where it's like a fluffy cloud but this is an [00:19:57] fluffy cloud but this is an oversimplification of course because [00:20:00] oversimplification of course because pneumonia is when the alveoli fill up [00:20:02] pneumonia is when the alveoli fill up with pus [00:20:03] with pus the alveoli can fill up with a lot of [00:20:05] the alveoli can fill up with a lot of other things as well which lead to very [00:20:07] other things as well which lead to very different interpretations and diagnosis [00:20:09] different interpretations and diagnosis for the patients and treatment for the [00:20:11] for the patients and treatment for the patient so it's quite confusing [00:20:14] patient so it's quite confusing which is why radiologists trained for [00:20:16] which is why radiologists trained for years to be able to do this [00:20:19] the setup is we'll take an input image [00:20:22] the setup is we'll take an input image of someone's chest x-ray and output the [00:20:25] of someone's chest x-ray and output the binary label 0 1 which indicates the [00:20:29] binary label 0 1 which indicates the presence or the absence of pneumonia and [00:20:32] presence or the absence of pneumonia and here we use a 2d convolutional neural [00:20:35] here we use a 2d convolutional neural network which is pre pre trained on [00:20:37] network which is pre pre trained on imagenet ok so we looked at short good [00:20:41] imagenet ok so we looked at short good connections earlier and dense Nets had [00:20:46] connections earlier and dense Nets had this idea to take short cut connections [00:20:51] this idea to take short cut connections to the extreme it says what happens if [00:20:54] to the extreme it says what happens if we connect every layer to every other [00:20:56] we connect every layer to every other layer instead of just connecting sort of [00:20:59] layer instead of just connecting sort of one instead of having just one short cut [00:21:01] one instead of having just one short cut which is what ResNet had and dense net [00:21:07] which is what ResNet had and dense net beat the previous state of the art and [00:21:10] beat the previous state of the art and has generally lower error and fewer [00:21:13] has generally lower error and fewer parameters on the image net challenge so [00:21:17] parameters on the image net challenge so that's what we used for the data set [00:21:20] that's what we used for the data set when we started working on this project [00:21:23] when we started working on this project which was around October of last year [00:21:28] which was around October of last year there was this large data set that was [00:21:30] there was this large data set that was released by the NIH hundred thousand [00:21:34] released by the NIH hundred thousand chest x-rays and this was the largest [00:21:37] chest x-rays and this was the largest public data set at the time and here [00:21:41] public data set at the time and here each x-ray is annotated with up to 14 [00:21:44] each x-ray is annotated with up to 14 different pathologies and the way this [00:21:46] different pathologies and the way this annotation works is there's an NLP [00:21:48] annotation works is there's an NLP system which reads a report and then [00:21:51] system which reads a report and then outputs for each of several pathologies [00:21:54] outputs for each of several pathologies whether there is a mention whether there [00:21:57] whether there is a mention whether there is a negation like not pneumonia for [00:22:00] is a negation like not pneumonia for instance and then annotates accordingly [00:22:03] instance and then annotates accordingly and then for a test set we had four [00:22:07] and then for a test set we had four radiologists here at Stanford [00:22:08] radiologists here at Stanford independently annotate and tell us what [00:22:11] independently annotate and tell us what they thought was going on in those [00:22:13] they thought was going on in those x-rays so one of the questions [00:22:16] x-rays so one of the questions that comes up often in medical imaging [00:22:19] that comes up often in medical imaging is we have we have a model we have [00:22:25] is we have we have a model we have several experts but we don't really have [00:22:28] several experts but we don't really have a ground truth and we don't have a [00:22:30] a ground truth and we don't have a ground truth for several reasons [00:22:31] ground truth for several reasons sometimes one of them is just that it's [00:22:34] sometimes one of them is just that it's difficult to tell whether someone had [00:22:36] difficult to tell whether someone had pneumonia or not without additional [00:22:40] pneumonia or not without additional information like their clinical record [00:22:42] information like their clinical record or even once you gave them and to be [00:22:46] or even once you gave them and to be antibiotics did they get treated so [00:22:49] antibiotics did they get treated so really one way to evaluate whether a [00:22:53] really one way to evaluate whether a model is better than a radiologists or [00:22:57] model is better than a radiologists or as well as doing as well as the [00:22:59] as well as doing as well as the radiologist is by saying do they agree [00:23:01] radiologist is by saying do they agree with other experts similarly so that's [00:23:06] with other experts similarly so that's what we use here does the idea we say [00:23:08] what we use here does the idea we say okay let's have one of the radiologists [00:23:11] okay let's have one of the radiologists be the the the prediction model we're [00:23:16] be the the the prediction model we're evaluating and let's set another [00:23:19] evaluating and let's set another radiologist to be ground truth and now [00:23:22] radiologist to be ground truth and now we're going to compute the f1 score once [00:23:25] we're going to compute the f1 score once change the ground truth do it the second [00:23:28] change the ground truth do it the second time change it again third and then also [00:23:31] time change it again third and then also use the model as the ground truth and do [00:23:33] use the model as the ground truth and do it again and we can use a very symmetric [00:23:35] it again and we can use a very symmetric evaluation scheme but this time having [00:23:38] evaluation scheme but this time having the model be evaluated against each of [00:23:41] the model be evaluated against each of the four experts so we do that and then [00:23:45] the four experts so we do that and then we get a score for both of them well for [00:23:48] we get a score for both of them well for all of the experts and for the model and [00:23:49] all of the experts and for the model and we showed in our work that we were able [00:23:52] we showed in our work that we were able to do better than the average [00:23:55] to do better than the average radiologist at this task two ways to [00:24:01] radiologist at this task two ways to extend this in the future is to be able [00:24:03] extend this in the future is to be able to look at patient history as well and [00:24:06] to look at patient history as well and look at lateral radiographs and be able [00:24:10] look at lateral radiographs and be able to improve upon this diagnosis at the [00:24:14] to improve upon this diagnosis at the time at which we released our work on [00:24:17] time at which we released our work on all 14 pathologies we were able to [00:24:21] all 14 pathologies we were able to outperform the previous state-of-the-art [00:24:25] outperform the previous state-of-the-art okay so model interpretation model [00:24:28] okay so model interpretation model interpretations [00:24:29] interpretations yes this question that you had a future [00:24:33] yes this question that you had a future work so almost like okay so if you have [00:24:36] work so almost like okay so if you have pneumonia you present into the stalker [00:24:39] pneumonia you present into the stalker without like fever to talk to here with [00:24:42] without like fever to talk to here with syrup and coffee too much actually [00:24:44] syrup and coffee too much actually although that's not included at the [00:24:45] although that's not included at the model so thank you my question is that [00:24:47] model so thank you my question is that if you're going to a dataset you're [00:24:49] if you're going to a dataset you're trying to determine does this person [00:24:51] trying to determine does this person have pneumonia or that like that's one [00:24:54] have pneumonia or that like that's one thing but you don't sound not that you [00:24:56] thing but you don't sound not that you just don't have that data but you're not [00:24:58] just don't have that data but you're not looking at other differences let's take [00:24:59] looking at other differences let's take this out that's not cancer resemblance [00:25:01] this out that's not cancer resemblance to have like job other longer because [00:25:04] to have like job other longer because he'll call grass justice it's all those [00:25:07] he'll call grass justice it's all those are images that you're not really [00:25:11] are images that you're not really looking at so let's say and that's been [00:25:12] looking at so let's say and that's been a tough situation so the obvious [00:25:14] a tough situation so the obvious situation doesn't really give you much [00:25:15] situation doesn't really give you much to it right but the tough situation is [00:25:17] to it right but the tough situation is you get a patient that has a fever [00:25:19] you get a patient that has a fever it's coffee violently cancer or [00:25:22] it's coffee violently cancer or pneumonia or black lung disease then how [00:25:25] pneumonia or black lung disease then how do you how do you get your operative [00:25:27] do you how do you get your operative working that condition and also if [00:25:29] working that condition and also if you're not including all those other [00:25:31] you're not including all those other cases then it's not just that what's the [00:25:33] cases then it's not just that what's the use of it but like you'd only say yeah [00:25:36] use of it but like you'd only say yeah and this alone so I'm trying to keep it [00:25:38] and this alone so I'm trying to keep it at this technical to the technical class [00:25:40] at this technical to the technical class what's and is there a neural network [00:25:42] what's and is there a neural network architecture that you would use to be [00:25:44] architecture that you would use to be able to solve a number one it's a multi [00:25:47] able to solve a number one it's a multi task learning isn't it like sure sure [00:25:49] task learning isn't it like sure sure okay so let me try to boil those sort of [00:25:51] okay so let me try to boil those sort of sets of questions down so one is [00:25:54] sets of questions down so one is patients are coming in we're not getting [00:25:56] patients are coming in we're not getting access to their clinical histories so [00:25:59] access to their clinical histories so how are we able to make this [00:26:01] how are we able to make this determination at all so one thing is [00:26:03] determination at all so one thing is that when we're training the algorithm [00:26:04] that when we're training the algorithm we're training the algorithm on on [00:26:08] we're training the algorithm on on pathologies extracted from radiology [00:26:12] pathologies extracted from radiology reports and these radiology reports are [00:26:15] reports and these radiology reports are written with understanding of full [00:26:17] written with understanding of full clinical history and understanding of [00:26:21] clinical history and understanding of sort of what the patient presented with [00:26:24] sort of what the patient presented with in terms of symptoms as well so we're [00:26:26] in terms of symptoms as well so we're training the model on on these radiology [00:26:30] training the model on on these radiology reports which had access to more [00:26:32] reports which had access to more information and the second is that the [00:26:35] information and the second is that the utility of this is not as much in being [00:26:39] utility of this is not as much in being able to compare a patient's x-rays day [00:26:42] able to compare a patient's x-rays day to day as much as [00:26:44] to day as much as here is a new patient with a set of [00:26:46] here is a new patient with a set of symptoms and can we identify things from [00:26:49] symptoms and can we identify things from their chest x-rays which brings us to [00:26:55] their chest x-rays which brings us to model interpretation so if you were a [00:26:59] model interpretation so if you were a end-user for model oh I so when I was [00:27:05] end-user for model oh I so when I was back in undergrad and I was in the lab [00:27:08] back in undergrad and I was in the lab we were working on autonomous cars and I [00:27:12] we were working on autonomous cars and I thought about this a lot how many of you [00:27:13] thought about this a lot how many of you have been in an autonomous car how many [00:27:18] have been in an autonomous car how many if you would trust being an autonomous [00:27:21] if you would trust being an autonomous car yeah I thought about this as well [00:27:29] car yeah I thought about this as well would I trust being an autonomous car [00:27:31] would I trust being an autonomous car and I thought it'd be pretty sweet if [00:27:33] and I thought it'd be pretty sweet if the algorithm that was that was in the [00:27:37] the algorithm that was that was in the car would tell me whatever decision it [00:27:39] car would tell me whatever decision it was going to make in advance I know [00:27:41] was going to make in advance I know that's not possible at high speeds so [00:27:44] that's not possible at high speeds so that you know just in case I disagreed [00:27:46] that you know just in case I disagreed with a particular decision I could say [00:27:48] with a particular decision I could say no abort and and half the model sort of [00:27:53] no abort and and half the model sort of you know remake its decision and I think [00:27:57] you know remake its decision and I think the same holds true in healthcare as [00:27:59] the same holds true in healthcare as well the one advantage that happens in [00:28:01] well the one advantage that happens in healthcare is rather than having to make [00:28:03] healthcare is rather than having to make decisions within seconds like in the [00:28:05] decisions within seconds like in the case of autonomous car there is often a [00:28:08] case of autonomous car there is often a larger time frame like minutes or hours [00:28:10] larger time frame like minutes or hours that we have and and here it's it's [00:28:16] that we have and and here it's it's useful to be able to inform the [00:28:18] useful to be able to inform the clinician that's treating the patient to [00:28:20] clinician that's treating the patient to say hey here's what my model thought and [00:28:23] say hey here's what my model thought and why so here's the technique we use for [00:28:29] why so here's the technique we use for that class activation maps which you may [00:28:31] that class activation maps which you may cover in another lecture [00:28:34] cover in another lecture so I'll just I'll just leave it at [00:28:36] so I'll just I'll just leave it at saying that there are ways of being able [00:28:39] saying that there are ways of being able to look at what parts of the image are [00:28:41] to look at what parts of the image are most evident of a particular pathology [00:28:45] most evident of a particular pathology to generate these these heat maps so [00:28:50] to generate these these heat maps so here's a heat map that's generated for [00:28:52] here's a heat map that's generated for pneumonia so this x-ray has pneumonia [00:28:55] pneumonia so this x-ray has pneumonia and I can and [00:28:56] and I can and and and the algorithm in red is able to [00:29:01] and and the algorithm in red is able to highlight the areas where it thought was [00:29:03] highlight the areas where it thought was most problematic for that here's one in [00:29:07] most problematic for that here's one in which it's able to do a collapsed right [00:29:11] which it's able to do a collapsed right lung here's one in which Abel is able to [00:29:14] lung here's one in which Abel is able to find a small cancer and here the goal is [00:29:20] find a small cancer and here the goal is to be able to improve healthcare [00:29:22] to be able to improve healthcare delivery where in the developed world [00:29:26] delivery where in the developed world one of the things that it's useful for [00:29:28] one of the things that it's useful for is to be able to prioritize the workflow [00:29:31] is to be able to prioritize the workflow make sure the radiologists are getting [00:29:33] make sure the radiologists are getting to the patients most in need of care [00:29:35] to the patients most in need of care before once who's x-rays look more [00:29:38] before once who's x-rays look more normal but the second part which I'm [00:29:42] normal but the second part which I'm quite excited about is to increase the [00:29:44] quite excited about is to increase the access of medical imaging expertise [00:29:47] access of medical imaging expertise globally where right now the World [00:29:50] globally where right now the World Health Organization estimates that about [00:29:51] Health Organization estimates that about two-thirds of the world's population [00:29:53] two-thirds of the world's population does not have access to Diagnostics and [00:29:58] does not have access to Diagnostics and so we thought hey wouldn't it be cool if [00:30:01] so we thought hey wouldn't it be cool if we just made an app that was able to [00:30:05] we just made an app that was able to allow users to upload images of their [00:30:09] allow users to upload images of their off x-rays and be able to give its [00:30:14] off x-rays and be able to give its diagnosis so this is still in the works [00:30:17] diagnosis so this is still in the works so I'll show you what we've got running [00:30:20] so I'll show you what we've got running locally and so here I'm presented with a [00:30:25] locally and so here I'm presented with a screen that asked me to upload an x-ray [00:30:28] screen that asked me to upload an x-ray and so I have I have several x-rays here [00:30:33] and so I have I have several x-rays here and I'm gonna pick the one that says [00:30:36] and I'm gonna pick the one that says cardiomegaly so cardiomegaly refers to [00:30:39] cardiomegaly so cardiomegaly refers to the enlargement of the heart so I [00:30:44] the enlargement of the heart so I uploaded it now it's running the models [00:30:46] uploaded it now it's running the models running in the backend and within a [00:30:48] running in the backend and within a couple of seconds its outputted its [00:30:50] couple of seconds its outputted its diagnosis on the right so you'll see the [00:30:54] diagnosis on the right so you'll see the 14 pathologies that the model is trained [00:30:57] 14 pathologies that the model is trained on being listed and then next to them a [00:31:00] on being listed and then next to them a bar and at the top of this list is [00:31:03] bar and at the top of this list is cardiomegaly which is what this patient [00:31:08] cardiomegaly which is what this patient has the hardest sort of [00:31:10] has the hardest sort of out and if I hover on cardiomegaly I can [00:31:15] out and if I hover on cardiomegaly I can see that the probability is displayed on [00:31:18] see that the probability is displayed on there and that we talked about [00:31:21] there and that we talked about interpretation how do I believe that [00:31:22] interpretation how do I believe that this model is actually looking at the [00:31:24] this model is actually looking at the heart rather than looking at something [00:31:26] heart rather than looking at something else and so if I click on it I get the [00:31:30] else and so if I click on it I get the class activation map for this which [00:31:32] class activation map for this which shows that indeed it is focused on the [00:31:34] shows that indeed it is focused on the heart to be able to and and is looking [00:31:39] heart to be able to and and is looking at the right thing so I guess you can [00:31:41] at the right thing so I guess you can say the algorithms hearts in the right [00:31:43] say the algorithms hearts in the right place cool but I thought so this is an [00:31:49] place cool but I thought so this is an image that I got from the the data set [00:31:53] image that I got from the the data set that we were using NIH but it's pretty [00:31:56] that we were using NIH but it's pretty cool if an algorithm is able to [00:31:57] cool if an algorithm is able to generalize to populations beyond and so [00:32:01] generalize to populations beyond and so I thought what we do is we could just [00:32:02] I thought what we do is we could just look up look up an image of cardiomegaly [00:32:08] look up look up an image of cardiomegaly and download it and just see if our [00:32:14] and download it and just see if our model is able to this one looks pretty [00:32:19] model is able to this one looks pretty large so does this I don't want an [00:32:25] large so does this I don't want an annotated one all right that's good so [00:32:29] annotated one all right that's good so we can do that save it desktop and now [00:32:39] we can do that save it desktop and now we can upload it here [00:32:46] and it's already we done a thing and on [00:32:52] and it's already we done a thing and on the top is cardiomegaly once again so [00:32:55] the top is cardiomegaly once again so it's able to generalize to and there [00:32:58] it's able to generalize to and there it's the highlight so it's able to [00:33:00] it's the highlight so it's able to generalize to populations beyond just [00:33:02] generalize to populations beyond just the ones it was trained on so I'm very [00:33:05] the ones it was trained on so I'm very excited by that and what I got even more [00:33:10] excited by that and what I got even more excited by is we're thinking of [00:33:13] excited by is we're thinking of deploying this out in out in different [00:33:16] deploying this out in out in different parts of the world and when we got an [00:33:19] parts of the world and when we got an image that showed how x-rays are read in [00:33:24] image that showed how x-rays are read in this hospital that we were working with [00:33:27] this hospital that we were working with in Africa this is what we saw and so the [00:33:32] in Africa this is what we saw and so the idea that one could snap a picture and [00:33:34] idea that one could snap a picture and upload it seems and get a diagnosis [00:33:37] upload it seems and get a diagnosis seems very powerful so the third case [00:33:42] seems very powerful so the third case study I want to take you through is [00:33:44] study I want to take you through is being able to look at M R so we've [00:33:46] being able to look at M R so we've talked about 1d a 1d setup where we had [00:33:50] talked about 1d a 1d setup where we had an ECG signal we've talked about a 2d [00:33:52] an ECG signal we've talked about a 2d setup with an x-ray how many of you [00:33:55] setup with an x-ray how many of you thinking of working on a 3d problem for [00:33:59] thinking of working on a 3d problem for your project whew that's good cool so [00:34:07] your project whew that's good cool so here we looked at niyama so mrs of the [00:34:10] here we looked at niyama so mrs of the knee is the standard of care to evaluate [00:34:13] knee is the standard of care to evaluate knee disorders and more mr examinations [00:34:17] knee disorders and more mr examinations are performed on the knee than any other [00:34:19] are performed on the knee than any other part of the body and the question that [00:34:24] part of the body and the question that we sought out to answer was can we [00:34:27] we sought out to answer was can we identify knee of normalities two of the [00:34:32] identify knee of normalities two of the most common ones include an ACL tear and [00:34:34] most common ones include an ACL tear and a meniscal tear at the level of [00:34:37] a meniscal tear at the level of radiologists now with the 3d problem one [00:34:42] radiologists now with the 3d problem one thing that we have that we don't have in [00:34:44] thing that we have that we don't have in a 2d setting is the ability to look to [00:34:47] a 2d setting is the ability to look to look at the same same thing from [00:34:49] look at the same same thing from different angles and so when radio I'll [00:34:52] different angles and so when radio I'll just do this diagnosis they look at [00:34:54] just do this diagnosis they look at three views the sagittal the coronal and [00:34:57] three views the sagittal the coronal and the axial which are four to three ways [00:35:01] the axial which are four to three ways of looking through the 3d structure of [00:35:05] of looking through the 3d structure of the knee and in an mr you get different [00:35:08] the knee and in an mr you get different types of series based on the magnetic [00:35:12] types of series based on the magnetic fields and so there are three different [00:35:15] fields and so there are three different series that are that are used and what [00:35:19] series that are that are used and what we're gonna do is output for a [00:35:23] we're gonna do is output for a particular knee amar examination the [00:35:26] particular knee amar examination the probability that it's abnormal the [00:35:29] probability that it's abnormal the probability of an ACL tear and the [00:35:31] probability of an ACL tear and the probability of a meniscal tear important [00:35:34] probability of a meniscal tear important thing to recognize here is this is not a [00:35:36] thing to recognize here is this is not a multi-class problem in that I could have [00:35:39] multi-class problem in that I could have both types of tears [00:35:41] both types of tears it's a multi-label problem so we're [00:35:46] it's a multi-label problem so we're going to train a convolutional neural [00:35:49] going to train a convolutional neural network for every view pathology pair so [00:35:54] network for every view pathology pair so that's nine convolutional networks and [00:35:58] that's nine convolutional networks and then combine them together using a [00:36:02] then combine them together using a logistic regression so here's what each [00:36:06] logistic regression so here's what each convolutional neural network looks like [00:36:08] convolutional neural network looks like I have a bunch of slices within a view [00:36:11] I have a bunch of slices within a view I'm gonna pass each of them to a feature [00:36:14] I'm gonna pass each of them to a feature extractor I'm gonna get an output [00:36:16] extractor I'm gonna get an output probability so we had a thousand four [00:36:22] probability so we had a thousand four hundred knee mr exams from the Stanford [00:36:26] hundred knee mr exams from the Stanford Medical Center and we tested on 120 of [00:36:31] Medical Center and we tested on 120 of them where the majority vote of three [00:36:35] them where the majority vote of three subspecialty radiologists established [00:36:38] subspecialty radiologists established the ground truth and we found that we [00:36:43] the ground truth and we found that we did pretty well on on the three tasks [00:36:47] did pretty well on on the three tasks and had the model be able to pick up the [00:36:52] and had the model be able to pick up the different abnormalities pretty well and [00:36:54] different abnormalities pretty well and one can extend these these methods of [00:36:56] one can extend these these methods of interpretive interpretability [00:36:58] interpretive interpretability to to 3d 3d inputs as well so that's [00:37:03] to to 3d 3d inputs as well so that's what we did here okay so I I saw this I [00:37:09] what we did here okay so I I saw this I saw this cartoon a few a few weeks ago [00:37:12] saw this cartoon a few a few weeks ago and I thought it was pretty funny [00:37:15] and I thought it was pretty funny which is a lot of machine learning [00:37:17] which is a lot of machine learning engineers think that they don't need to [00:37:20] engineers think that they don't need to externally validate which is find out [00:37:22] externally validate which is find out how my model works on works on data [00:37:26] how my model works on works on data that's not my where my original data set [00:37:29] that's not my where my original data set came from so there's a there's a [00:37:31] came from so there's a there's a difference in in distribution but it's [00:37:34] difference in in distribution but it's really quite exciting when a model does [00:37:38] really quite exciting when a model does generalize to two data sets that it's [00:37:41] generalize to two data sets that it's not seen before and so we got this data [00:37:46] not seen before and so we got this data set that's that's public from a hospital [00:37:49] set that's that's public from a hospital in Croatia and here's how it was [00:37:51] in Croatia and here's how it was different so it was a different it was a [00:37:54] different so it was a different it was a different kind of series two different [00:37:56] different kind of series two different magnetic properties is a different [00:37:59] magnetic properties is a different scanner and it was a different [00:38:01] scanner and it was a different institution in a different country and [00:38:02] institution in a different country and we asked okay what happens when we run [00:38:05] we asked okay what happens when we run this model off the shelf that was [00:38:07] this model off the shelf that was trained on Stanford data but tested on [00:38:10] trained on Stanford data but tested on that kind of data and we found that it [00:38:12] that kind of data and we found that it did relatively well without any training [00:38:16] did relatively well without any training at all but then when we trained on it we [00:38:20] at all but then when we trained on it we found that we were able to outperform [00:38:22] found that we were able to outperform the previous lis best reported result on [00:38:26] the previous lis best reported result on the data set so there's still some work [00:38:29] the data set so there's still some work to be done in being able to generalize [00:38:33] to be done in being able to generalize sort of my network here that was trained [00:38:36] sort of my network here that was trained on my data to be able to work on [00:38:39] on my data to be able to work on datasets from different institutions [00:38:40] datasets from different institutions different countries as well but we're [00:38:43] different countries as well but we're making some steps along that way remains [00:38:45] making some steps along that way remains a very open problem for taking [00:38:53] yeah so we did the best we could in [00:38:56] yeah so we did the best we could in terms of processing so we had so one of [00:39:00] terms of processing so we had so one of the pre-processing steps that's [00:39:01] the pre-processing steps that's important is being able to get the mean [00:39:05] important is being able to get the mean of the of the input data to be as close [00:39:09] of the of the input data to be as close to the mean of the input data that you [00:39:12] to the mean of the input data that you train up [00:39:13] train up so that was one pre-processing step we [00:39:15] so that was one pre-processing step we tried when we were trying to minimize [00:39:17] tried when we were trying to minimize that to say out of the box how would [00:39:20] that to say out of the box how would this work if we had never seen this data [00:39:21] this work if we had never seen this data before how would it work on that [00:39:23] before how would it work on that population so one big topic in across a [00:39:32] population so one big topic in across a lot of applied fields is asking question [00:39:35] lot of applied fields is asking question okay we're talking about models working [00:39:39] okay we're talking about models working automatically autonomously how would [00:39:42] automatically autonomously how would these models work in when working [00:39:47] these models work in when working together with experts in different [00:39:50] together with experts in different fields and here we ask that questions [00:39:52] fields and here we ask that questions about radiologists and about imaging [00:39:55] about radiologists and about imaging models would it be possible to be able [00:39:58] models would it be possible to be able to boost the performance if the model [00:40:02] to boost the performance if the model and the radiologists work together and [00:40:06] and the radiologists work together and so that's really the set up a [00:40:08] so that's really the set up a radiologists wit model is that better [00:40:11] radiologists wit model is that better than the radiologists by themselves and [00:40:15] than the radiologists by themselves and here's how we set it up we said let's [00:40:17] here's how we set it up we said let's have experts read the same case twice [00:40:21] have experts read the same case twice separated by a certain set of weeks and [00:40:27] separated by a certain set of weeks and then see how they would perform on the [00:40:30] then see how they would perform on the same set of cases and what we found that [00:40:35] same set of cases and what we found that we were able to increase the performance [00:40:38] we were able to increase the performance generally with a significant significant [00:40:41] generally with a significant significant increase in specificity for ACL tears [00:40:45] increase in specificity for ACL tears that means if someone if a patient came [00:40:48] that means if someone if a patient came in without a without an ACL tear I'd be [00:40:55] in without a without an ACL tear I'd be able to find it better so in the future [00:41:01] able to find it better so in the future yes question [00:41:03] yes question the opinion of the radiologists art is [00:41:06] the opinion of the radiologists art is that that intended thing that one kind [00:41:08] that that intended thing that one kind of bias anything import actually looks [00:41:11] of bias anything import actually looks at the Commission's out yeah so that's a [00:41:13] at the Commission's out yeah so that's a good question and I and I think how so [00:41:16] good question and I and I think how so the sort of automation bias captures a [00:41:20] the sort of automation bias captures a lot of this and that once we have sort [00:41:23] lot of this and that once we have sort of models working with experts together [00:41:27] of models working with experts together can we expect that the experts will sort [00:41:31] can we expect that the experts will sort of take it less seriously that's that's [00:41:34] of take it less seriously that's that's a big concern and start relying on what [00:41:36] a big concern and start relying on what the model says and says I won't even [00:41:38] the model says and says I won't even look at this exam I'm just going to [00:41:40] look at this exam I'm just going to trust what the model says blindly that's [00:41:44] trust what the model says blindly that's absolutely possible in a very open area [00:41:46] absolutely possible in a very open area of research some of the ways that people [00:41:48] of research some of the ways that people have tried to address it is to say you [00:41:51] have tried to address it is to say you know what I'm gonna do from time to time [00:41:53] know what I'm gonna do from time to time I'm going to pass in an exam to the [00:41:55] I'm going to pass in an exam to the radiologist for which I'm going to flip [00:41:58] radiologist for which I'm going to flip the answer and I'll know the right one [00:42:01] the answer and I'll know the right one and if they get that wrong I'll alert [00:42:03] and if they get that wrong I'll alert them that you're relying too much on the [00:42:06] them that you're relying too much on the model stop but there are a lot of more [00:42:09] model stop but there are a lot of more sophisticated ways to go about [00:42:11] sophisticated ways to go about addressing automated bias and as far as [00:42:13] addressing automated bias and as far as I know it's a very open field of [00:42:16] I know it's a very open field of research especially as we're getting [00:42:17] research especially as we're getting into deep learning assistants and one [00:42:23] into deep learning assistants and one utility of this is to say basically that [00:42:26] utility of this is to say basically that the set of patients don't need follow-up [00:42:28] the set of patients don't need follow-up let's not send them for unnecessary [00:42:31] let's not send them for unnecessary surgery great so I shared three case [00:42:35] surgery great so I shared three case studies from lab the final thing I want [00:42:37] studies from lab the final thing I want to do is to talk a little bit about how [00:42:40] to do is to talk a little bit about how you can get involved if you're [00:42:42] you can get involved if you're interested in applications of AI to [00:42:45] interested in applications of AI to healthcare so the first is the ability [00:42:52] healthcare so the first is the ability for you to just get your hands dirty [00:42:54] for you to just get your hands dirty with datasets and and be able to try out [00:42:58] with datasets and and be able to try out your own model so we have from our lab [00:43:02] your own model so we have from our lab released the Morra dataset which is a [00:43:05] released the Morra dataset which is a large data set of bone x-rays and the [00:43:09] large data set of bone x-rays and the task is to be able to tell if it's if [00:43:12] task is to be able to tell if it's if the x-rays are normal or not and they [00:43:15] the x-rays are normal or not and they come from different [00:43:17] come from different parts of the of the upper body and [00:43:21] parts of the of the upper body and that's that's what the dataset x-rays [00:43:24] that's that's what the dataset x-rays look like and this is a pretty [00:43:26] look like and this is a pretty interesting setup because you have more [00:43:28] interesting setup because you have more than one view more than one angle for [00:43:32] than one view more than one angle for the same body part for the same study [00:43:34] the same body part for the same study for the same patient and the goal is to [00:43:36] for the same patient and the goal is to be able to combine this well into [00:43:38] be able to combine this well into convolutional neural network and and be [00:43:41] convolutional neural network and and be able to output the probability of [00:43:42] able to output the probability of abnormality and one of the interesting [00:43:45] abnormality and one of the interesting things here for transfer learning as [00:43:47] things here for transfer learning as well is do you want to train the models [00:43:50] well is do you want to train the models differently per body part or do you want [00:43:52] differently per body part or do you want to train them train the same model for [00:43:55] to train them train the same model for body parts or combine certain models [00:43:58] body parts or combine certain models it's a lot of design decisions there and [00:44:01] it's a lot of design decisions there and this is what train some train models [00:44:03] this is what train some train models look like this is a model baseline that [00:44:06] look like this is a model baseline that we released that's able to identify a [00:44:08] we released that's able to identify a fracture here and a piece of hardware on [00:44:11] fracture here and a piece of hardware on the right and you can download the data [00:44:17] the right and you can download the data set of our website so if you google [00:44:20] set of our website so if you google Maura data set or go on our website [00:44:23] Maura data set or go on our website Stanford ml group github I oh you should [00:44:26] Stanford ml group github I oh you should be able to find it the second way to get [00:44:30] be able to find it the second way to get involved is through the AF for [00:44:32] involved is through the AF for healthcare bootcamp which is a two [00:44:35] healthcare bootcamp which is a two quarter long program that our lab runs [00:44:38] quarter long program that our lab runs which provides students coming out of [00:44:41] which provides students coming out of classes like 2:30 an opportunity to get [00:44:44] classes like 2:30 an opportunity to get involved in research and here's students [00:44:49] involved in research and here's students receive training from PhD students in [00:44:52] receive training from PhD students in the lab and medical school faculty to [00:44:56] the lab and medical school faculty to work on structured projects over two [00:44:58] work on structured projects over two quarters and if you have a background in [00:45:00] quarters and if you have a background in sir AI which you do then you're [00:45:04] sir AI which you do then you're encouraged to apply and we're working on [00:45:06] encouraged to apply and we're working on a wide set of problems across radiology [00:45:10] a wide set of problems across radiology EHR Public Health and pathology right [00:45:12] EHR Public Health and pathology right now this is what the lab looks like we [00:45:18] now this is what the lab looks like we have a lot of fun [00:45:22] and the applications for the bootcamp [00:45:25] and the applications for the bootcamp starting in the winter are now open so [00:45:27] starting in the winter are now open so the early applications deadline is [00:45:29] the early applications deadline is remember 23rd and you can go on this [00:45:32] remember 23rd and you can go on this link and and and apply so that's my time [00:45:38] link and and and apply so that's my time thank you so much for having me and [00:45:40] thank you so much for having me and thanks for having me can yes I'll take a [00:45:55] thanks for having me can yes I'll take a couple questions you asked a question [00:45:58] couple questions you asked a question about privacy concerns in terms of other [00:46:01] about privacy concerns in terms of other ethics concerns [00:46:03] ethics concerns what about compensation for the medical [00:46:05] what about compensation for the medical experts figures potentially putting out [00:46:07] experts figures potentially putting out of business with a rule like the one day [00:46:10] of business with a rule like the one day you're you're developing or you know and [00:46:12] you're you're developing or you know and just in general because their their [00:46:15] just in general because their their knowledge is being used to train these [00:46:17] knowledge is being used to train these models it's not free [00:46:19] models it's not free yeah so the question was we're having [00:46:23] yeah so the question was we're having these automated AI models trained with [00:46:27] these automated AI models trained with the knowledge of medical experts what [00:46:31] the knowledge of medical experts what are ways in which we're thinking of [00:46:33] are ways in which we're thinking of compensating these medical experts right [00:46:36] compensating these medical experts right now or in the future when we have [00:46:38] now or in the future when we have possibly automated models I think a lot [00:46:43] possibly automated models I think a lot of people are thinking about these [00:46:44] of people are thinking about these problems and working on them right now [00:46:47] problems and working on them right now there are a variety of approaches that [00:46:52] there are a variety of approaches that people are thinking about in terms of [00:46:53] people are thinking about in terms of economic incentives and there's a lot of [00:46:56] economic incentives and there's a lot of fear about sort of well AI actually work [00:47:01] fear about sort of well AI actually work with or augment experts in whatever [00:47:04] with or augment experts in whatever field they're working on I don't have a [00:47:06] field they're working on I don't have a great silver bullet for this but I know [00:47:10] great silver bullet for this but I know there's there's a lot of work going on [00:47:11] there's there's a lot of work going on in there when you're eating through MRIs [00:47:18] in there when you're eating through MRIs we show looking at four or five category [00:47:21] we show looking at four or five category of issues like one of them is most [00:47:24] of issues like one of them is most likely it's possible that a human [00:47:27] likely it's possible that a human looking at it could point out something [00:47:29] looking at it could point out something that was not being looked at by the AI [00:47:32] that was not being looked at by the AI model at that time yeah so how do you [00:47:35] model at that time yeah so how do you yeah that's a great question so that [00:47:38] yeah that's a great question so that just to repeat the question [00:47:39] just to repeat the question it's we have we're looking at M our [00:47:42] it's we have we're looking at M our exams and we're [00:47:43] exams and we're saying from these three pathologies were [00:47:45] saying from these three pathologies were able to output the probabilities what [00:47:48] able to output the probabilities what happens if there's another pathology [00:47:50] happens if there's another pathology that we haven't looked at so I have a [00:47:54] that we haven't looked at so I have a couple of answers for that the first is [00:47:56] couple of answers for that the first is that one of the one of the categories [00:47:58] that one of the one of the categories here was simply to tell whether it was [00:48:01] here was simply to tell whether it was normal or abnormal so the idea here is [00:48:04] normal or abnormal so the idea here is that the abnormality class will capture [00:48:06] that the abnormality class will capture a lot of different pathologies there at [00:48:09] a lot of different pathologies there at least the ones seen at Stanford but it's [00:48:12] least the ones seen at Stanford but it's often the case that we're building for [00:48:14] often the case that we're building for one particular pathology and then [00:48:17] one particular pathology and then there's obviously a a burden on the the [00:48:21] there's obviously a a burden on the the model and the model developers to be [00:48:23] model and the model developers to be able to convey hey look our algorithm [00:48:25] able to convey hey look our algorithm model only does this and you really need [00:48:29] model only does this and you really need to watch out for everything else that [00:48:30] to watch out for everything else that the model doesn't cover [00:48:33] the model doesn't cover maybe that's the unless there's one more [00:48:37] maybe that's the unless there's one more question no all right that's the last [00:48:40] question no all right that's the last question we'll take then thank you once [00:48:41] question we'll take then thank you once again so now you've got you've got the [00:48:48] again so now you've got you've got the the perspective is the microphone [00:48:50] the perspective is the microphone working yeah now you've got the [00:48:52] working yeah now you've got the perspective of may I researcher working [00:48:55] perspective of may I researcher working in healthcare now you are going to be [00:48:57] in healthcare now you are going to be the AI research researcher working in [00:48:59] the AI research researcher working in healthcare we're going to go over a case [00:49:01] healthcare we're going to go over a case study that is targeted at skin disease [00:49:04] study that is targeted at skin disease so you know in order to detect skin [00:49:07] so you know in order to detect skin disease sometimes you take pictures [00:49:10] disease sometimes you take pictures microscopic pictures of cells on your [00:49:12] microscopic pictures of cells on your skin and then analyze those pictures so [00:49:14] skin and then analyze those pictures so that's what we're going to talk about [00:49:15] that's what we're going to talk about today so let me talk about the problem [00:49:18] today so let me talk about the problem statement your deep learning engineer [00:49:22] statement your deep learning engineer and you've been chosen by a group of [00:49:24] and you've been chosen by a group of healthcare practitioners to determine [00:49:27] healthcare practitioners to determine which parts of a microscopic image [00:49:30] which parts of a microscopic image corresponds to a cell okay so here's how [00:49:33] corresponds to a cell okay so here's how it looks like on the the black and white [00:49:37] it looks like on the the black and white it's not a black and white image it's a [00:49:39] it's not a black and white image it's a color image which looks black and white [00:49:40] color image which looks black and white the input image is the one that is [00:49:42] the input image is the one that is closer to me and the yellow one is the [00:49:47] closer to me and the yellow one is the ground truth that has been labeled by a [00:49:50] ground truth that has been labeled by a doctor let's say so what you're trying [00:49:52] doctor let's say so what you're trying to do is to segment those cells on this [00:49:56] to do is to segment those cells on this image and we [00:49:57] image and we talk about segmentation yet or a little [00:49:59] talk about segmentation yet or a little bit segmentation is is about producing [00:50:04] bit segmentation is is about producing value a class for each of the pixels on [00:50:07] value a class for each of the pixels on our image so in this case each pixel [00:50:09] our image so in this case each pixel would correspond to either no cell or [00:50:12] would correspond to either no cell or cell zero or one and once we output a [00:50:17] cell zero or one and once we output a matrix of zeros and ones telling us [00:50:19] matrix of zeros and ones telling us which pixels correspond it to a cell we [00:50:23] which pixels correspond it to a cell we should get hopefully a mask like the [00:50:24] should get hopefully a mask like the yellow mask that I overlapped with the [00:50:28] yellow mask that I overlapped with the input image does that make sense yeah [00:50:34] color image the other one you don't have [00:50:37] color image the other one you don't have the boundaries for yourself yeah we'll [00:50:38] the boundaries for yourself yeah we'll talk about the boundary later but right [00:50:40] talk about the boundary later but right now assume it's a binary segmentation so [00:50:43] now assume it's a binary segmentation so 0 & 1 [00:50:43] 0 & 1 no cell and cell okay so it's going to [00:50:49] no cell and cell okay so it's going to be very interactive and I think we're [00:50:52] be very interactive and I think we're going to use Monte for several question [00:50:54] going to use Monte for several question and group you guys into groups of three [00:50:57] and group you guys into groups of three so here are other examples of images [00:50:59] so here are other examples of images that were segmented with a mask now [00:51:03] that were segmented with a mask now doctors have collected 100,000 images [00:51:07] doctors have collected 100,000 images coming from microscopes but the images [00:51:10] coming from microscopes but the images come from three different microscopes [00:51:12] come from three different microscopes there is a type a Type B n type C [00:51:14] there is a type a Type B n type C microscope and the data is split in [00:51:17] microscope and the data is split in between these three as 50% for type a [00:51:19] between these three as 50% for type a 25% for type B 25% for type C the first [00:51:24] 25% for type B 25% for type C the first question I'll have for you is given that [00:51:28] question I'll have for you is given that the doctors want to be able to use your [00:51:30] the doctors want to be able to use your algorithm on images from the microscope [00:51:33] algorithm on images from the microscope of type C this microscope is the latest [00:51:36] of type C this microscope is the latest one it's the one that is going to be [00:51:37] one it's the one that is going to be used widely in the field and they want [00:51:40] used widely in the field and they want your your network to work on this one [00:51:41] your your network to work on this one how would you split your data set into [00:51:44] how would you split your data set into trained dev and test set as a question [00:51:46] trained dev and test set as a question and please group in teams of two or [00:51:48] and please group in teams of two or three and discuss it for a minute on how [00:51:52] three and discuss it for a minute on how you would split this data set [00:53:02] you can start going on men's Ian and [00:53:05] you can start going on men's Ian and write down your answers as well okay so [00:53:29] write down your answers as well okay so take a 30 seconds to input your your [00:53:33] take a 30 seconds to input your your insights on on mentee you can do one per [00:53:35] insights on on mentee you can do one per team [00:53:37] and we'll start going over some of the [00:53:41] and we'll start going over some of the answers here [00:53:43] answers here okay the f-test sleaze split see train [00:53:48] okay the f-test sleaze split see train on a plus B 20k in train 2.5 in Devon [00:53:54] on a plus B 20k in train 2.5 in Devon test training 80 on a all the 5k see [00:54:01] test training 80 on a all the 5k see deaf 10k see test 10k see 95 5 where [00:54:06] deaf 10k see test 10k see 95 5 where tests and them is from population we [00:54:08] tests and them is from population we care about I think these are good [00:54:09] care about I think these are good answers I think there is no perfect [00:54:11] answers I think there is no perfect answer to that but two things to take [00:54:13] answer to that but two things to take into consideration you have a lot of [00:54:15] into consideration you have a lot of data so you probably want to split it [00:54:18] data so you probably want to split it into 95 5 closer to that than 260 2020 [00:54:22] into 95 5 closer to that than 260 2020 and most importantly you want to have [00:54:25] and most importantly you want to have see images in the test dev and test set [00:54:28] see images in the test dev and test set to have the same distribution among [00:54:30] to have the same distribution among these two that's what you've seen in the [00:54:31] these two that's what you've seen in the third course and we would prefer to have [00:54:35] third course and we would prefer to have actually see images in the train set [00:54:37] actually see images in the train set you want your algorithm to have seen see [00:54:39] you want your algorithm to have seen see images so I would say you're very good [00:54:40] images so I would say you're very good answer is is this one 1955 where the [00:54:44] answer is is this one 1955 where the five and five are exclusively from [00:54:46] five and five are exclusively from see and you also have see images in the [00:54:48] see and you also have see images in the 90% of train images any other insights [00:54:52] 90% of train images any other insights on that what your grease yeah how do we [00:54:57] on that what your grease yeah how do we attack that like microscopes Nvidia [00:54:59] attack that like microscopes Nvidia doesn't have my hidden you know features [00:55:03] doesn't have my hidden you know features that mess up any yeah so there is much [00:55:06] that mess up any yeah so there is much more thing we didn't talk about here one [00:55:07] more thing we didn't talk about here one is how do we know what's the [00:55:09] is how do we know what's the distribution of microscope a images and [00:55:12] distribution of microscope a images and microscope images versus microscope see [00:55:13] microscope images versus microscope see do they look like each other if they do [00:55:15] do they look like each other if they do all good if they don't how can we how [00:55:19] all good if they don't how can we how can we make sure the model doesn't get [00:55:22] can we make sure the model doesn't get bad hints from these two distributions [00:55:25] bad hints from these two distributions another thing is data date augmentation [00:55:27] another thing is data date augmentation we could augment this dataset as well [00:55:29] we could augment this dataset as well and try to get as much as C distribution [00:55:32] and try to get as much as C distribution images as possible we're going to talk [00:55:34] images as possible we're going to talk about that okay speed has to roughly be [00:55:40] about that okay speed has to roughly be 95 5 not 60 20/20 distribution of dev [00:55:43] 95 5 not 60 20/20 distribution of dev and test sets has to be the same contain [00:55:45] and test sets has to be the same contain images from CN there also there should [00:55:46] images from CN there also there should also be see images in the training set [00:55:48] also be see images in the training set now talking about date augmentation do [00:55:52] now talking about date augmentation do you think you can augment this data and [00:55:54] you think you can augment this data and if yes give only 3ds things method you [00:55:59] if yes give only 3ds things method you would use if no xsplit explicate explain [00:56:02] would use if no xsplit explicate explain why you cannot you want to take 30 [00:56:06] why you cannot you want to take 30 seconds to talk about it with your [00:56:08] seconds to talk about it with your neighbors [00:57:23] okay okay guys let's go over some of the [00:57:38] okay okay guys let's go over some of the answers so rotation zoom blur I think [00:57:42] answers so rotation zoom blur I think looking at the images that we have from [00:57:44] looking at the images that we have from the cells this might work very well [00:57:47] the cells this might work very well rotation zoom blur translation [00:57:51] rotation zoom blur translation combination of those stretch symmetry [00:57:55] combination of those stretch symmetry like probably a lot of those work a one [00:57:58] like probably a lot of those work a one follow-up question that I'll have is can [00:58:00] follow-up question that I'll have is can you can someone give an example of a [00:58:03] you can someone give an example of a task where the augmentation might hurt [00:58:06] task where the augmentation might hurt the model rather than helping it [00:58:15] if you want to overfeed on the test set [00:58:18] if you want to overfeed on the test set can you be more precise like then you [00:58:22] can you be more precise like then you don't want to generalize oh you don't [00:58:24] don't want to generalize oh you don't want your model to generalize too much [00:58:26] want your model to generalize too much okay yeah that did there's some cases [00:58:30] okay yeah that did there's some cases where you don't want them what model to [00:58:32] where you don't want them what model to generalize too much especially no to an [00:58:33] generalize too much especially no to an encoder but any any other ideas you're [00:58:36] encoder but any any other ideas you're doing like face detection move on the [00:58:38] doing like face detection move on the face be like upside down or like either [00:58:40] face be like upside down or like either side I see so if you do face detection [00:58:43] side I see so if you do face detection you probably don't want the face to be [00:58:45] you probably don't want the face to be upside down although we never know [00:58:46] upside down although we never know depending on the use but it's it's not [00:58:52] depending on the use but it's it's not gonna help much if the camera is always [00:58:54] gonna help much if the camera is always like that and it's filming humans that [00:58:56] like that and it's filming humans that are not upside now and yeah but I don't [00:58:59] are not upside now and yeah but I don't think it's gonna hurt the model it's [00:59:00] think it's gonna hurt the model it's probably going to not help the model I [00:59:02] probably going to not help the model I guess yeah if you really stretch the [00:59:09] guess yeah if you really stretch the image so there's our algorithms like [00:59:14] image so there's our algorithms like maybe you know flow net it's an [00:59:15] maybe you know flow net it's an algorithm that that's used for long [00:59:18] algorithm that that's used for long videos to detect the speed of the car [00:59:20] videos to detect the speed of the car let's say if you stretch the images [00:59:22] let's say if you stretch the images probably you cannot detect the speed of [00:59:24] probably you cannot detect the speed of the car anymore any other examples [00:59:28] the car anymore any other examples [Music] [00:59:34] character recognition I think it's a [00:59:37] character recognition I think it's a good example so let's say you you're [00:59:39] good example so let's say you you're trying to detect pcs and you do [00:59:41] trying to detect pcs and you do symmetric flip and you get that you know [00:59:45] symmetric flip and you get that you know like your your labeling is be everything [00:59:47] like your your labeling is be everything that was D and as d everything that was [00:59:49] that was D and as d everything that was B for nine and six it's the same story [00:59:51] B for nine and six it's the same story so these data augmentations are actually [00:59:54] so these data augmentations are actually hurting the model because you don't real [00:59:56] hurting the model because you don't real able when you data when you match your [00:59:57] able when you data when you match your data right okay okay so yeah many [01:00:05] data right okay okay so yeah many augmentation methods are possible [01:00:07] augmentation methods are possible cropping adding random noise changing [01:00:10] cropping adding random noise changing contrast I think the atomic chain is [01:00:12] contrast I think the atomic chain is super important I remember a story of of [01:00:16] super important I remember a story of of a company that was working on a [01:00:18] a company that was working on a self-driving cars and and also virtual [01:00:21] self-driving cars and and also virtual assistants in cars you know like this [01:00:24] assistants in cars you know like this type of interaction you have with [01:00:25] type of interaction you have with someone in your car a virtual assistant [01:00:27] someone in your car a virtual assistant and they noticed that the speech [01:00:29] and they noticed that the speech recognition system was actually not [01:00:32] recognition system was actually not working well when the car was going [01:00:34] working well when the car was going backwards like no idea why like why is [01:00:39] backwards like no idea why like why is this doesn't seem related to the speech [01:00:40] this doesn't seem related to the speech recognition system of the car and they [01:00:43] recognition system of the car and they test it out and they looked and they [01:00:46] test it out and they looked and they figured out that people were putting [01:00:48] figured out that people were putting their hands in the passenger seat [01:00:50] their hands in the passenger seat looking back and talking to the virtual [01:00:51] looking back and talking to the virtual assistant and because the microphone was [01:00:53] assistant and because the microphone was in the front the voice was very [01:00:55] in the front the voice was very different when you were talking to to [01:00:56] different when you were talking to to the back of the car rather than the [01:00:58] the back of the car rather than the front of the car and so they used date [01:01:00] front of the car and so they used date augmentation in order to augment their [01:01:02] augmentation in order to augment their current data they didn't have de town [01:01:04] current data they didn't have de town that type of of people talking to the [01:01:07] that type of of people talking to the back of the car so by augmenting smartly [01:01:09] back of the car so by augmenting smartly you can change the voices so that they [01:01:10] you can change the voices so that they look like they were used by someone who [01:01:13] look like they were used by someone who was talking to the back of the car and [01:01:14] was talking to the back of the car and then solve the problem okay [01:01:19] then solve the problem okay small question we can do it quickly what [01:01:22] small question we can do it quickly what is the mathematical relation between NX [01:01:24] is the mathematical relation between NX and NY so remember we have an RGB image [01:01:27] and NY so remember we have an RGB image and we can we can flatten it into a [01:01:31] and we can we can flatten it into a vector of size n X and the output is a [01:01:33] vector of size n X and the output is a mask of size and Y what's the [01:01:35] mask of size and Y what's the relationship between NX and NY someone [01:01:38] relationship between NX and NY someone wants to go for it [01:01:45] yeah three boys their equality [01:01:49] yeah three boys their equality who thinks they record good things [01:01:54] who thinks they record good things they're not equal and why because you [01:01:58] they're not equal and why because you have RGB on this side and why would be 3 [01:02:04] have RGB on this side and why would be 3 and X I saw n X will be 3 and Y because [01:02:08] and X I saw n X will be 3 and Y because you have RGB images and for each RGB [01:02:11] you have RGB images and for each RGB pixel you would have 1 output 0 or 1 ok [01:02:15] pixel you would have 1 output 0 or 1 ok that was a question on one of the [01:02:16] that was a question on one of the midterms was a complicated question [01:02:19] midterms was a complicated question what's the last activation of your [01:02:21] what's the last activation of your network sigmoid you want probably an [01:02:26] network sigmoid you want probably an output 0 & 1 and if you had several [01:02:29] output 0 & 1 and if you had several classes so later on we will see we can [01:02:31] classes so later on we will see we can also segment per DC's then you would [01:02:33] also segment per DC's then you would have the softmax what loss function [01:02:36] have the softmax what loss function should we use I'm going to give it to [01:02:40] should we use I'm going to give it to you to go quickly because we don't have [01:02:41] you to go quickly because we don't have too much time you're going to use a [01:02:44] too much time you're going to use a binary cross-entropy loss over all the [01:02:48] binary cross-entropy loss over all the outputs the entries of of the outputs of [01:02:53] outputs the entries of of the outputs of your network doesn't make sense [01:02:55] your network doesn't make sense so always think the thinking through the [01:02:58] so always think the thinking through the last function is interesting ok so you [01:03:03] last function is interesting ok so you have a first try and you've coded your [01:03:05] have a first try and you've coded your own neural network that you've yelled f2 [01:03:08] own neural network that you've yelled f2 you've named model n1 m1 and you've [01:03:11] you've named model n1 m1 and you've trained it for a thousand eight bucks it [01:03:13] trained it for a thousand eight bucks it doesn't end up performing well so it [01:03:15] doesn't end up performing well so it looks like that you give it the input [01:03:16] looks like that you give it the input image through the model and get an [01:03:19] image through the model and get an output that is expected to be the [01:03:21] output that is expected to be the following one but it's not so one of [01:03:24] following one but it's not so one of your friends tells you about transfer [01:03:25] your friends tells you about transfer learning and they they tell you about [01:03:28] learning and they they tell you about another labelled data set of 1 million [01:03:31] another labelled data set of 1 million microscope images that have been labeled [01:03:34] microscope images that have been labeled for skin disease classification which [01:03:38] for skin disease classification which are very similar to those you want to [01:03:39] are very similar to those you want to work with from microscope C so a model [01:03:44] work with from microscope C so a model m2 has already been trained by another [01:03:46] m2 has already been trained by another research lab on this new data set on a [01:03:49] research lab on this new data set on a 10 class disease classification and so [01:03:52] 10 class disease classification and so here is an example of input output of [01:03:53] here is an example of input output of the model you have an input image that [01:03:56] the model you have an input image that probably looks very similar to the ones [01:03:58] probably looks very similar to the ones you're working on the network has a [01:04:01] you're working on the network has a certain number of layers and [01:04:03] certain number of layers and softmax classification at the end that [01:04:05] softmax classification at the end that gives you the probability distribution [01:04:05] gives you the probability distribution over the disease that seems to [01:04:07] over the disease that seems to correspond to this image so they're not [01:04:09] correspond to this image so they're not doing segmentation anymore right [01:04:11] doing segmentation anymore right they're doing classification okay so the [01:04:15] they're doing classification okay so the question here is going to be you want to [01:04:18] question here is going to be you want to perform transfer learning from M 2 to M [01:04:20] perform transfer learning from M 2 to M 1 what are the hyper parameters that you [01:04:23] 1 what are the hyper parameters that you will have to tune it's more difficult [01:04:29] will have to tune it's more difficult than it looks like so think about it [01:04:31] than it looks like so think about it discuss with your neighbors for a minute [01:04:32] discuss with your neighbors for a minute try to figure out what are the hyper [01:04:34] try to figure out what are the hyper parameters involved in this transfer [01:04:36] parameters involved in this transfer learning process [01:05:59] okay take 15 more seconds to wrap it up [01:06:11] okay let's see what you guys have [01:06:16] okay let's see what you guys have learning rate it is a hyper parameter I [01:06:19] learning rate it is a hyper parameter I know if it's specific to the transfer [01:06:22] know if it's specific to the transfer learning weights of the last layers so I [01:06:25] learning weights of the last layers so I don't think that's a high parameter [01:06:27] don't think that's a high parameter weights are parameters new cost function [01:06:32] weights are parameters new cost function for additional output layers I think [01:06:34] for additional output layers I think that's the hyper that the choice of the [01:06:36] that's the hyper that the choice of the loss you might count it as a high [01:06:38] loss you might count it as a high parameter I don't think it's [01:06:39] parameter I don't think it's specifically related to transfer [01:06:40] specifically related to transfer learning you will have to train with the [01:06:42] learning you will have to train with the loss you've used on your model m1 number [01:06:46] loss you've used on your model m1 number of new layers yeah [01:06:48] of new layers yeah weights of the new not a hyper parameter [01:06:55] okay last one or Twitter layers of M - [01:06:58] okay last one or Twitter layers of M - so do we train what do we fine-tune it's [01:07:02] so do we train what do we fine-tune it's a lot about layers actually size of [01:07:04] a lot about layers actually size of added layers not sure okay let's go over [01:07:11] added layers not sure okay let's go over it together because it seems that [01:07:12] it together because it seems that there's a lot of different answers here [01:07:16] there's a lot of different answers here you try to write it down here so let's [01:07:18] you try to write it down here so let's say we have we have the model M - is it [01:07:25] say we have we have the model M - is it big enough for the back we have the [01:07:27] big enough for the back we have the model M - and so we give it an input [01:07:28] model M - and so we give it an input image okay input and the model M 2 gives [01:07:34] image okay input and the model M 2 gives us a probability distribution softmax so [01:07:38] us a probability distribution softmax so we have a soft max here you will agree [01:07:42] we have a soft max here you will agree that we probably don't need the softmax [01:07:43] that we probably don't need the softmax layer we don't want it we want to do [01:07:45] layer we don't want it we want to do such segmentation so one thing we have [01:07:47] such segmentation so one thing we have to choose is how much of this [01:07:49] to choose is how much of this pre-trained network because it's a [01:07:51] pre-trained network because it's a pre-trained network how much of this [01:07:53] pre-trained network how much of this network do we keep let's say we keep [01:07:56] network do we keep let's say we keep these layers because they probably know [01:08:00] these layers because they probably know the inherent salient features of the [01:08:02] the inherent salient features of the data set like the edges of the cells [01:08:04] data set like the edges of the cells that were very interested in so we take [01:08:06] that were very interested in so we take it so we have it here and you agree that [01:08:11] it so we have it here and you agree that here we have a first high parameter that [01:08:13] here we have a first high parameter that is L the number of layers from m2 that [01:08:16] is L the number of layers from m2 that we take now what other hyperparameters [01:08:21] we take now what other hyperparameters do we have to choose this is L we [01:08:27] do we have to choose this is L we probably have to add a certain number of [01:08:29] probably have to add a certain number of layers here in order to produce our [01:08:31] layers here in order to produce our segmentation so there's probably another [01:08:33] segmentation so there's probably another hyper parameter which is l0 how many [01:08:40] hyper parameter which is l0 how many layers do I stack on top of this one and [01:08:42] layers do I stack on top of this one and remember these layers are pre-trained [01:08:45] remember these layers are pre-trained but these ones are randomly initialized [01:08:55] that make sense so too hyper parameters [01:08:58] that make sense so too hyper parameters anyone see the third one [01:09:06] the third one comes when you decide to [01:09:08] the third one comes when you decide to train this new network you have the [01:09:11] train this new network you have the input image give it to the network get [01:09:16] input image give it to the network get the output segmentation mask [01:09:18] the output segmentation mask segmentation mask let's say sag mask and [01:09:22] segmentation mask let's say sag mask and what you have to decide is how many of [01:09:25] what you have to decide is how many of these layers will I freeze how many of [01:09:27] these layers will I freeze how many of the pre train layers I freeze probably [01:09:30] the pre train layers I freeze probably if you have a small data set you'd [01:09:32] if you have a small data set you'd prefer keeping the features that are [01:09:34] prefer keeping the features that are here freezing them and focusing on [01:09:36] here freezing them and focusing on retraining the last few layers so there [01:09:39] retraining the last few layers so there is another high parameter which is how [01:09:41] is another high parameter which is how much of this will I freeze LF what does [01:09:44] much of this will I freeze LF what does it mean to freeze it means during [01:09:46] it mean to freeze it means during training I don't train these layers [01:09:47] training I don't train these layers I assume that they've been seeing a lot [01:09:50] I assume that they've been seeing a lot of data already they understand very [01:09:52] of data already they understand very well the edges and less complex features [01:09:55] well the edges and less complex features of the data I'm going to use my new my [01:09:58] of the data I'm going to use my new my small data set to Train the last layers [01:10:00] small data set to Train the last layers so three hyper parameters l l0 and LF [01:10:06] so three hyper parameters l l0 and LF that makes sense [01:10:08] that makes sense okay so this is for transfer learning so [01:10:11] okay so this is for transfer learning so it looks more complicated than the [01:10:13] it looks more complicated than the question the question was more [01:10:15] question the question was more complicated and it looked like okay [01:10:19] complicated and it looked like okay let's move where am i okay let's go over [01:10:24] let's move where am i okay let's go over another question okay so this we did it [01:10:32] another question okay so this we did it now it's interesting because here we [01:10:35] now it's interesting because here we have an input image and in the middle we [01:10:37] have an input image and in the middle we have the output that the doctor would [01:10:38] have the output that the doctor would like but on the right you have the [01:10:42] like but on the right you have the output of your algorithm so you see that [01:10:44] output of your algorithm so you see that there is a difference between what they [01:10:46] there is a difference between what they want and what we're producing and it [01:10:49] want and what we're producing and it goes back to someone mentioned it [01:10:51] goes back to someone mentioned it earlier there is a problem here how do [01:10:53] earlier there is a problem here how do you think you can correct the model [01:10:54] you think you can correct the model and/or the data set to satisfy the [01:10:57] and/or the data set to satisfy the doctor's request so the issue with with [01:10:59] doctor's request so the issue with with this image is that they want to be able [01:11:01] this image is that they want to be able to separate the cells among them and [01:11:03] to separate the cells among them and they cannot do it based on your [01:11:04] they cannot do it based on your algorithm it's still little hard there [01:11:06] algorithm it's still little hard there is there's something to add so can [01:11:09] is there's something to add so can someone come up with the answer or do [01:11:11] someone come up with the answer or do you want to explain actually you mention [01:11:12] you want to explain actually you mention one of the answers so that we we can [01:11:14] one of the answers so that we we can finish this lecture [01:11:17] now it looks like you could have like [01:11:19] now it looks like you could have like three cells on the bottom left blurring [01:11:21] three cells on the bottom left blurring it together it and so if you ask for [01:11:24] it together it and so if you ask for adding boundaries it makes the cells [01:11:25] adding boundaries it makes the cells more well-defined good answer so one way [01:11:28] more well-defined good answer so one way is when you label your datasets [01:11:30] is when you label your datasets originally you label it with zeros and [01:11:32] originally you label it with zeros and ones for every pixel now instead you [01:11:35] ones for every pixel now instead you will label with three classes zero one [01:11:38] will label with three classes zero one or boundary likes let's say 0 1 2 4 [01:11:42] or boundary likes let's say 0 1 2 4 boundary or even the best method I would [01:11:44] boundary or even the best method I would say is that for each pixel for each [01:11:47] say is that for each pixel for each input pixel the output will be the [01:11:51] input pixel the output will be the corresponding okay this one is not good [01:11:55] corresponding okay this one is not good the corresponding label like this is a [01:11:57] the corresponding label like this is a sell picture he off sell P of boundary [01:12:03] sell picture he off sell P of boundary and P of no sell what you will do is [01:12:09] and P of no sell what you will do is that instead of having a sigmoid [01:12:11] that instead of having a sigmoid activation you will use a soft max [01:12:12] activation you will use a soft max actuation okay and the soft max will be [01:12:17] actuation okay and the soft max will be for a pixel one other way to do that if [01:12:21] for a pixel one other way to do that if it still didn't work doesn't work even [01:12:23] it still didn't work doesn't work even if you label the boundaries what is [01:12:25] if you label the boundaries what is another way to do that [01:12:27] another way to do that Yury label your datasets by taking into [01:12:31] Yury label your datasets by taking into account the boundaries the model still [01:12:33] account the boundaries the model still doesn't perform what I think it's all [01:12:39] doesn't perform what I think it's all about the waiting of the last function [01:12:41] about the waiting of the last function it's likely that the number of pixels [01:12:43] it's likely that the number of pixels that our boundaries are going to be [01:12:44] that our boundaries are going to be fewer than the number of pixels that [01:12:45] fewer than the number of pixels that ourselves or no cells so the network [01:12:48] ourselves or no cells so the network will be biased towards predicting cell [01:12:50] will be biased towards predicting cell or no cell instead what you can do is [01:12:52] or no cell instead what you can do is when you compute your loss function your [01:12:55] when you compute your loss function your loss function should have three terms [01:12:56] loss function should have three terms one binary cross entropy let's say for [01:12:59] one binary cross entropy let's say for no cell 1 4 cell and one 4 boundary okay [01:13:09] no cell 1 4 cell and one 4 boundary okay and this is going to be summed over I [01:13:13] and this is going to be summed over I equals 1 to N I the whole output pixel [01:13:19] equals 1 to N I the whole output pixel values what you can do is to atribute a [01:13:21] values what you can do is to atribute a coefficient to each of those alpha beta [01:13:25] coefficient to each of those alpha beta or 1 and by tweaking its coefficient if [01:13:28] or 1 and by tweaking its coefficient if you put a very high a very low number [01:13:29] you put a very high a very low number here in the [01:13:30] here in the it means you're telling your model to [01:13:32] it means you're telling your model to focus on the boundary you're telling the [01:13:34] focus on the boundary you're telling the model model that if you miss the [01:13:35] model model that if you miss the boundary it's a huge penalty we want you [01:13:38] boundary it's a huge penalty we want you to train by figuring out all the [01:13:39] to train by figuring out all the boundaries that's another trick that you [01:13:41] boundaries that's another trick that you could use one question on that yeah good [01:13:51] could use one question on that yeah good question what do I mean by really really [01:13:53] question what do I mean by really really willing your dataset this this last try [01:13:55] willing your dataset this this last try this section you've been labeling [01:13:56] this section you've been labeling bounding boxes you know for the yellow [01:13:59] bounding boxes you know for the yellow algorithm so the same tools are [01:14:01] algorithm so the same tools are available for segmentation where you [01:14:02] available for segmentation where you have an image and you would draw the [01:14:04] have an image and you would draw the different lines in practice if the more [01:14:08] different lines in practice if the more if the tool that you were using the line [01:14:10] if the tool that you were using the line used will just count as a cell [01:14:12] used will just count as a cell everything including the language with [01:14:14] everything including the language with everything inside what you draw plus the [01:14:17] everything inside what you draw plus the boundary would count as cell and the [01:14:19] boundary would count as cell and the rest has no cell it's just a line of [01:14:21] rest has no cell it's just a line of code to make it different the line you [01:14:22] code to make it different the line you drew will count as boundary everything [01:14:25] drew will count as boundary everything inside will count as cell and everything [01:14:28] inside will count as cell and everything outside would count as no cell so it's [01:14:30] outside would count as no cell so it's the way you use your labeling tool [01:14:31] the way you use your labeling tool that's all I think it's not learnable [01:14:41] that's all I think it's not learnable parameters it's more hyperparameters to [01:14:43] parameters it's more hyperparameters to tune you know the same way you tune [01:14:45] tune you know the same way you tune lambda for your regularization you would [01:14:47] lambda for your regularization you would tune alpha and beta so when you make it [01:14:53] so this is not an attention mechanism [01:14:56] so this is not an attention mechanism because it's just a training trick I [01:14:58] because it's just a training trick I would say you cannot know how much [01:15:00] would say you cannot know how much attention we tell you for each image how [01:15:03] attention we tell you for each image how much the model is looking at this board [01:15:04] much the model is looking at this board versus that part this is not going to [01:15:06] versus that part this is not going to tell you that it's just a training trick [01:15:10] what's the advantage to doing it this [01:15:12] what's the advantage to doing it this way suppose like object detection like [01:15:15] way suppose like object detection like the tank thing so the question is what's [01:15:19] the tank thing so the question is what's the advantage of the segmentation rather [01:15:21] the advantage of the segmentation rather than detection yeah so detection means [01:15:24] than detection yeah so detection means you want to output a bounding box if you [01:15:26] you want to output a bounding box if you output a bounding box what you could do [01:15:28] output a bounding box what you could do is output the bounding box crop it out [01:15:30] is output the bounding box crop it out and then analyze the cell and try to [01:15:32] and then analyze the cell and try to find the contour of the cell but if you [01:15:35] find the contour of the cell but if you want to separate the cells if you want [01:15:39] want to separate the cells if you want to be very precise segmentation is going [01:15:40] to be very precise segmentation is going to work well if you want to be very fast [01:15:42] to work well if you want to be very fast bounding boxes would work better and you [01:15:44] bounding boxes would work better and you guys too [01:15:44] guys too the way segmentation is not working as [01:15:47] the way segmentation is not working as fast as the yo logarithm works for [01:15:49] fast as the yo logarithm works for object detection yeah I would say that [01:15:52] object detection yeah I would say that but it's more much more precise [01:15:55] but it's more much more precise okay so modify the datasets in order to [01:15:59] okay so modify the datasets in order to label the boundaries on top of that you [01:16:01] label the boundaries on top of that you can change the last function to give [01:16:02] can change the last function to give more weights to boundaries or penalize [01:16:04] more weights to boundaries or penalize false positives [01:16:06] false positives okay we have one more slide I think so [01:16:10] okay we have one more slide I think so let's go over it so now the doctors they [01:16:13] let's go over it so now the doctors they give you a new data sets that contain [01:16:14] give you a new data sets that contain images similar to the previous ones the [01:16:20] images similar to the previous ones the difference is that each mean image now [01:16:21] difference is that each mean image now is labeled with zero and one zero [01:16:23] is labeled with zero and one zero meaning there are no cancer cells on [01:16:26] meaning there are no cancer cells on that image and one means there is at [01:16:28] that image and one means there is at least a cancer cell on this image so [01:16:31] least a cancer cell on this image so we're not doing segmentation anymore [01:16:32] we're not doing segmentation anymore it's a binary classification image [01:16:34] it's a binary classification image cancer or no cancer okay so you easily [01:16:39] cancer or no cancer okay so you easily build the state-of-the-art model because [01:16:40] build the state-of-the-art model because you're you're a very strong person in [01:16:42] you're you're a very strong person in classification and you achieve 99% [01:16:46] classification and you achieve 99% accuracy the doctors are super happy and [01:16:49] accuracy the doctors are super happy and they ask you to explain the networks [01:16:51] they ask you to explain the networks prediction so given an image classified [01:16:56] prediction so given an image classified as one how can you figure out based on [01:16:58] as one how can you figure out based on which cell the model predicts one [01:17:00] which cell the model predicts one soprana I've talked a little bit about [01:17:02] soprana I've talked a little bit about that there are other methods that you [01:17:04] that there are other methods that you should be able to figure out right now [01:17:05] should be able to figure out right now even if you don't know class activation [01:17:07] even if you don't know class activation Maps so to sum it up we have an image [01:17:15] Maps so to sum it up we have an image input image put it in your new network [01:17:18] input image put it in your new network that is a binary classifier and the [01:17:24] that is a binary classifier and the network says one you want to figure out [01:17:27] network says one you want to figure out why the network size one based on which [01:17:29] why the network size one based on which pixels what do you do [01:17:35] huh visualize ways visualize the weight [01:17:41] huh visualize ways visualize the weight what do you visualize in the words so I [01:17:46] what do you visualize in the words so I think visualizing the weights is not [01:17:49] think visualizing the weights is not related to the input the weights are not [01:17:51] related to the input the weights are not gonna change based on the input so here [01:17:53] gonna change based on the input so here you want to know why this inputs led to [01:17:56] you want to know why this inputs led to one so it's not about the weight good [01:18:03] one so it's not about the weight good idea so you know after you get the one [01:18:06] idea so you know after you get the one here this is why hats [01:18:08] here this is why hats basically it's not exactly one let's say [01:18:09] basically it's not exactly one let's say it's point seven probability what you [01:18:13] it's point seven probability what you can remember is that this number [01:18:15] can remember is that this number derivative of Y hat with respect to X is [01:18:18] derivative of Y hat with respect to X is what it's a matrix of shape same as X [01:18:22] what it's a matrix of shape same as X you know two matrix and each entry of [01:18:28] you know two matrix and each entry of the matrix is telling you how much [01:18:31] the matrix is telling you how much moving this pixel influences Y hat you [01:18:35] moving this pixel influences Y hat you agree so the top left number here is [01:18:38] agree so the top left number here is telling you how much X 1 is impacting Y [01:18:44] telling you how much X 1 is impacting Y hat is it or not maybe it's not if you [01:18:47] hat is it or not maybe it's not if you have a card detector and the cat is here [01:18:49] have a card detector and the cat is here you can change this this pixel it's [01:18:51] you can change this this pixel it's never gonna change anything [01:18:52] never gonna change anything so the value here is going to be very [01:18:54] so the value here is going to be very small closer to zero let's assume the [01:18:56] small closer to zero let's assume the cancer cell is here you will see high [01:18:59] cancer cell is here you will see high number in this part of the matrix [01:19:01] number in this part of the matrix because this this these are the pixel [01:19:03] because this this these are the pixel that if we move them it will change Y [01:19:05] that if we move them it will change Y hat does it make sense it's a quick way [01:19:07] hat does it make sense it's a quick way to interpret your network it doesn't [01:19:09] to interpret your network it doesn't it's not too too good like you're not [01:19:13] it's not too too good like you're not gonna have tremendous results but you [01:19:14] gonna have tremendous results but you should see these pixels have higher [01:19:16] should see these pixels have higher derivative values than the others okay [01:19:20] derivative values than the others okay that's one way and then we will see in [01:19:21] that's one way and then we will see in two weeks how to interpret neural [01:19:24] two weeks how to interpret neural networks visualizing the weights [01:19:25] networks visualizing the weights included and all the other methods okay [01:19:31] included and all the other methods okay so gradient with respect to your model [01:19:33] so gradient with respect to your model detects cancer cells from the test set [01:19:37] detects cancer cells from the test set images with 99% accuracy while a doctor [01:19:40] images with 99% accuracy while a doctor would on average perform 97% on the same [01:19:43] would on average perform 97% on the same task is this possible or not [01:19:47] who thinks it's possible to have a [01:19:50] who thinks it's possible to have a network that achieves more accuracy on [01:19:52] network that achieves more accuracy on the test set than the doctor okay can [01:19:55] the test set than the doctor okay can someone can someone say why you have an [01:20:03] someone can someone say why you have an explanation [01:20:11] okay the network probably looks at [01:20:13] okay the network probably looks at complex things that doctor didn't say [01:20:14] complex things that doctor didn't say they didn't see that's what you're [01:20:16] they didn't see that's what you're saying possibly I think there's a more [01:20:22] saying possibly I think there's a more rigorous explanation so here we're [01:20:30] rigorous explanation so here we're talking about Bayes they're human level [01:20:32] talking about Bayes they're human level performance and all that stuff that's [01:20:33] performance and all that stuff that's when you should see it so one thing is [01:20:35] when you should see it so one thing is that there are many concepts that you [01:20:36] that there are many concepts that you will see in course tree that are [01:20:37] will see in course tree that are actually implemented in the industry but [01:20:39] actually implemented in the industry but it's it's not because you know them that [01:20:41] it's it's not because you know them that you're going to understand that it's [01:20:44] you're going to understand that it's time to use them and that's what we want [01:20:46] time to use them and that's what we want you to get to like now when I ask you [01:20:47] you to get to like now when I ask you this question you have to talk think [01:20:49] this question you have to talk think about pacer human level accuracy and so [01:20:51] about pacer human level accuracy and so on so the question that you should ask [01:20:53] on so the question that you should ask here is what was the data set labeled [01:20:57] here is what was the data set labeled what were the labels coming from if the [01:21:00] what were the labels coming from if the data set was labeled by individual [01:21:02] data set was labeled by individual doctors I think that looks weird like if [01:21:05] doctors I think that looks weird like if it was labeled by individual doctors I [01:21:07] it was labeled by individual doctors I think it's very weird that the model [01:21:09] think it's very weird that the model performs better on the test set then [01:21:12] performs better on the test set then what doctors have labeled because simply [01:21:14] what doctors have labeled because simply because the labels are wrong three [01:21:16] because the labels are wrong three percent of the time on average the [01:21:18] percent of the time on average the labels are wrong so you're you're [01:21:19] labels are wrong so you're you're teaching wrong things to your model [01:21:21] teaching wrong things to your model three percent of the time so it's [01:21:23] three percent of the time so it's surprising that it gets better could [01:21:24] surprising that it gets better could happen but surprising but if every [01:21:28] happen but surprising but if every single image of the dates that has been [01:21:29] single image of the dates that has been labeled by a group of doctors as pronoun [01:21:31] labeled by a group of doctors as pronoun I've talked about it then the average [01:21:33] I've talked about it then the average accuracy of this group of doctor is [01:21:36] accuracy of this group of doctor is probably higher than one doctor [01:21:37] probably higher than one doctor maybe it's 99 percent in which case it [01:21:40] maybe it's 99 percent in which case it makes sense that the model can beat one [01:21:41] makes sense that the model can beat one doctor this is make sense so you have a [01:21:44] doctor this is make sense so you have a sir you're trying to approximate with [01:21:45] sir you're trying to approximate with we'd like the best error you can achieve [01:21:48] we'd like the best error you can achieve so we're grouping grouping a cluster of [01:21:51] so we're grouping grouping a cluster of doctors probably better than one doctor [01:21:53] doctors probably better than one doctor this is your human level performance and [01:21:55] this is your human level performance and then you should be able to beat one [01:21:57] then you should be able to beat one doctor it's like [01:22:00] okay so you want to build a pipeline [01:22:04] okay so you want to build a pipeline that goes from image taken by the front [01:22:08] that goes from image taken by the front of your car to a steering direction for [01:22:14] of your car to a steering direction for autonomous driving what you could do is [01:22:16] autonomous driving what you could do is that you could send this image to a car [01:22:19] that you could send this image to a car detector that detects all the cars a [01:22:23] detector that detects all the cars a pedestrian detector that detects all the [01:22:29] pedestrian detector that detects all the pedestrians and then you can give it to [01:22:34] pedestrians and then you can give it to a pass cleaner let's say that cleanse [01:22:39] a pass cleaner let's say that cleanse the path and outputs the steering [01:22:41] the path and outputs the steering direction let's say so it's not n to it [01:22:43] direction let's say so it's not n to it and two n would be I have an input image [01:22:45] and two n would be I have an input image and I give it I want this output so a [01:22:49] and I give it I want this output so a few other disadvantages of this is is a [01:22:54] few other disadvantages of this is is a something can go wrong anywhere in the [01:22:56] something can go wrong anywhere in the model you know how do you know which [01:23:00] model you know how do you know which part of the model went wrong can you [01:23:03] part of the model went wrong can you tell me which part [01:23:04] tell me which part I'll give you an image the same [01:23:07] I'll give you an image the same direction is wrong why [01:23:21] good idea [01:23:23] good idea looking at the different components so [01:23:24] looking at the different components so what you can do is look what happens [01:23:28] what you can do is look what happens here and there look what's happening [01:23:30] here and there look what's happening here and there you think based on this [01:23:32] here and there you think based on this image the car detector worked well or [01:23:34] image the car detector worked well or not you can check it out do you think [01:23:38] not you can check it out do you think the pedestrian detector worked well not [01:23:39] the pedestrian detector worked well not you can check it out if there is [01:23:40] you can check it out if there is something wrong here it's probably one [01:23:42] something wrong here it's probably one of these two items it doesn't mean this [01:23:44] of these two items it doesn't mean this one is good it just means that these two [01:23:45] one is good it just means that these two items are wrong how do you check that [01:23:48] items are wrong how do you check that this one is good you can label ground [01:23:50] this one is good you can label ground truth images and give them here as input [01:23:53] truth images and give them here as input to this one and figure out if it's [01:23:54] to this one and figure out if it's figuring out the steering direction or [01:23:56] figuring out the steering direction or not if it is it seems that the path [01:23:58] not if it is it seems that the path planner is working what if it is not it [01:24:01] planner is working what if it is not it means there's a problem here now what if [01:24:04] means there's a problem here now what if every single component seemed to work [01:24:06] every single component seemed to work properly like let's say these to work [01:24:09] properly like let's say these to work properly but there is still a problem it [01:24:14] properly but there is still a problem it might be because what you selected as a [01:24:16] might be because what you selected as a human was wrong the path standard cannot [01:24:21] human was wrong the path standard cannot detect cannot get the steering Direction [01:24:23] detect cannot get the steering Direction correct based on only the pedestrians [01:24:24] correct based on only the pedestrians and the car detection and the cars [01:24:26] and the car detection and the cars probably need the stop signs and stuff [01:24:28] probably need the stop signs and stuff like that this way you know and so [01:24:30] like that this way you know and so because you made hand engineering [01:24:31] because you made hand engineering choices here your model might go wrong [01:24:33] choices here your model might go wrong that's another thing and another [01:24:36] that's another thing and another advantage of this type of pipeline is [01:24:40] advantage of this type of pipeline is that data is probably easier to find out [01:24:43] that data is probably easier to find out at M for every algorithm rather than the [01:24:45] at M for every algorithm rather than the for whole the whole end-to-end pipeline [01:24:47] for whole the whole end-to-end pipeline if you want to collect data for the [01:24:49] if you want to collect data for the entire pipeline you would need to take a [01:24:50] entire pipeline you would need to take a car put a camera in the front like like [01:24:55] car put a camera in the front like like build a kind of steering wheel angle [01:24:58] build a kind of steering wheel angle detector that will measure your ceiling [01:25:01] detector that will measure your ceiling wheel at every step while you're driving [01:25:03] wheel at every step while you're driving so you need to drive everywhere [01:25:05] so you need to drive everywhere basically with a car that has this [01:25:07] basically with a car that has this feature it's pretty hard you need a lot [01:25:10] feature it's pretty hard you need a lot of data a lot of roads while this one [01:25:12] of data a lot of roads while this one you can collect data of images anywhere [01:25:15] you can collect data of images anywhere and label it's a label the pedestrians [01:25:18] and label it's a label the pedestrians on it you can detect cars by the same [01:25:20] on it you can detect cars by the same process okay so these choices also [01:25:24] process okay so these choices also depend on what data can you access [01:25:25] depend on what data can you access easily or what data is harder to acquire [01:25:30] easily or what data is harder to acquire any questions on that you're going to [01:25:34] any questions on that you're going to learn about convolution on your networks [01:25:36] learn about convolution on your networks now we're gonna get fun with a lot of [01:25:38] now we're gonna get fun with a lot of imaging you have a quiz into programming [01:25:41] imaging you have a quiz into programming assignment for the first module second [01:25:43] assignment for the first module second module same midterm next Friday not this [01:25:46] module same midterm next Friday not this one [01:25:47] one everything up to C for m2 will be [01:25:50] everything up to C for m2 will be included in the meter so up to the [01:25:52] included in the meter so up to the videos you're watching this week [01:25:53] videos you're watching this week includes TA sections and next one and [01:25:57] includes TA sections and next one and every in-class lecture including next [01:25:59] every in-class lecture including next Wednesday's in this Friday you have a TA [01:26:02] Wednesday's in this Friday you have a TA section any questions on that [01:26:07] section any questions on that okay see you next week guys ================================================================================ LECTURE 006 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 6 - Deep Learning Project Strategy Source: https://www.youtube.com/watch?v=G5FNYxbW_Qw --- Transcript [00:00:05] all right hey everyone welcome back um [00:00:08] all right hey everyone welcome back um is this people hear me [00:00:10] is this people hear me okay all right so if as usual you can [00:00:13] okay all right so if as usual you can take a second to enter your uh Su ID so [00:00:16] take a second to enter your uh Su ID so we know who's [00:00:17] we know who's here um so today's lecture will be a [00:00:22] here um so today's lecture will be a Choose Your Own Adventure lecture um so [00:00:26] Choose Your Own Adventure lecture um so I think you know by now you've learned a [00:00:28] I think you know by now you've learned a lot about the um Tech iCal aspects of [00:00:30] lot about the um Tech iCal aspects of building learning algorithms and then in [00:00:33] building learning algorithms and then in the third course uh in the third set of [00:00:36] the third course uh in the third set of modules you saw some of the principles [00:00:38] modules you saw some of the principles for debuging learning algorithms and how [00:00:40] for debuging learning algorithms and how to actually use these tools um in order [00:00:43] to actually use these tools um in order to be efficient in how you build a [00:00:45] to be efficient in how you build a machine learning application what I want [00:00:47] machine learning application what I want to do today is uh step through with you [00:00:50] to do today is uh step through with you a moderately complicated machine [00:00:53] a moderately complicated machine learning application and um throughout [00:00:57] learning application and um throughout all of today's lecture I'm going to you [00:00:59] all of today's lecture I'm going to you know step you through a scenario and [00:01:01] know step you through a scenario and then ask you to kind of Choose Your Own [00:01:04] then ask you to kind of Choose Your Own Adventure because if you're working on [00:01:05] Adventure because if you're working on this project what are you going to do [00:01:07] this project what are you going to do right um and to give you more of that [00:01:09] right um and to give you more of that practice in the next um what hour and a [00:01:12] practice in the next um what hour and a bit that we have uh on thinking through [00:01:15] bit that we have uh on thinking through machine learning [00:01:17] machine learning strategy um and you know I've seen in so [00:01:21] strategy um and you know I've seen in so many projects uh there there sometimes [00:01:24] many projects uh there there sometimes things that a less strategically [00:01:27] things that a less strategically sophisticated team will take a year to [00:01:29] sophisticated team will take a year to do [00:01:30] do but if you're actually very strategic [00:01:32] but if you're actually very strategic and very sophisticated in deciding what [00:01:34] and very sophisticated in deciding what you will do next right how the drive a [00:01:36] you will do next right how the drive a project forward I've seen many times [00:01:39] project forward I've seen many times that what a different team will take to [00:01:41] that what a different team will take to do maybe you could do it in a month or [00:01:43] do maybe you could do it in a month or two right and you know if you're trying [00:01:46] two right and you know if you're trying to um I don't know write a research [00:01:49] to um I don't know write a research paper or build a business or build a [00:01:50] paper or build a business or build a product the the the ability to drive a [00:01:53] product the the the ability to drive a machine learning project quickly gives [00:01:55] machine learning project quickly gives you a huge advantage and just you know [00:01:57] you a huge advantage and just you know you're Mak it much more efficient use of [00:01:58] you're Mak it much more efficient use of your life as well right [00:02:00] your life as well right um so in for today I'd like to uh uh I'm [00:02:04] um so in for today I'd like to uh uh I'm going to posee a scenario pose a machine [00:02:06] going to posee a scenario pose a machine learning application and say all right I [00:02:09] learning application and say all right I mean you are the CEO of this project [00:02:11] mean you are the CEO of this project what are you going to do next so but I'd [00:02:12] what are you going to do next so but I'd like to have today's meeting be quite [00:02:15] like to have today's meeting be quite interactive as well so can I get people [00:02:17] interactive as well so can I get people to sit in groups of two in ideally three [00:02:20] to sit in groups of two in ideally three or so maybe plus minus one and um try to [00:02:23] or so maybe plus minus one and um try to sit next to someone that you don't work [00:02:25] sit next to someone that you don't work with all the time uh so so if you're [00:02:27] with all the time uh so so if you're sitting sitting next to your best friend [00:02:29] sitting sitting next to your best friend I'm glad your best friend is in the [00:02:30] I'm glad your best friend is in the class with you but go sit with someone [00:02:33] class with you but go sit with someone else because I think um I've done this [00:02:35] else because I think um I've done this multiple times and the discussion is [00:02:36] multiple times and the discussion is actually richer if you talk to someone [00:02:38] actually richer if you talk to someone that you don't know super [00:02:40] that you don't know super well so actually take a second introduce [00:02:43] well so actually take a second introduce yourself and and and and just reach your [00:02:46] yourself and and and and just reach your neighbor I guess so the example I want [00:02:48] neighbor I guess so the example I want to go through today is actually a [00:02:51] to go through today is actually a continuation of the example I described [00:02:54] continuation of the example I described briefly uh uh in in the last lecture I [00:02:56] briefly uh uh in in the last lecture I taught building a speech recognition [00:02:58] taught building a speech recognition system right so remember I briefly um [00:03:02] system right so remember I briefly um multiv this uh trigger word wake word or [00:03:05] multiv this uh trigger word wake word or trigger word detection system last time [00:03:08] trigger word detection system last time where you know uh right I I I I actually [00:03:13] where you know uh right I I I I actually have both an Amazon Echo and a Google [00:03:15] have both an Amazon Echo and a Google home uh but you know it's it's a lot of [00:03:18] home uh but you know it's it's a lot of work to configure these things to turn [00:03:20] work to configure these things to turn on and off your light bulbs um and so if [00:03:23] on and off your light bulbs um and so if you can build a chip uh to sell to say a [00:03:26] you can build a chip uh to sell to say a lamp maker to recognize uh phrases like [00:03:31] lamp maker to recognize uh phrases like you know let's say we call the lamp [00:03:32] you know let's say we call the lamp robit right um then you can recognize [00:03:35] robit right um then you can recognize phrases like robit turn on right robit [00:03:39] phrases like robit turn on right robit turn off and you have a little switch to [00:03:42] turn off and you have a little switch to give this thing different names you call [00:03:44] give this thing different names you call robits or uh ler or Alice or something [00:03:47] robits or uh ler or Alice or something you can also have lener turn on Lena [00:03:49] you can also have lener turn on Lena turn off and give give your lamp a name [00:03:51] turn off and give give your lamp a name and just say hey Robert turn on right so [00:03:54] and just say hey Robert turn on right so rather than detecting different names [00:03:56] rather than detecting different names and turn on and turn off I'm just going [00:03:57] and turn on and turn off I'm just going to focus on just with a technical [00:04:00] to focus on just with a technical discussion I'm just going to focus on [00:04:02] discussion I'm just going to focus on the phrase Robert turn on uh but it's [00:04:05] the phrase Robert turn on uh but it's kind of the same problem you need to [00:04:07] kind of the same problem you need to solve like four times to give it two [00:04:08] solve like four times to give it two names or to turn on and turn off so I'm [00:04:10] names or to turn on and turn off so I'm going to abbreviate Robert turn on as [00:04:13] going to abbreviate Robert turn on as RTO right if you want to call your name [00:04:15] RTO right if you want to call your name Rob it and um uh tell your lamp to turn [00:04:19] Rob it and um uh tell your lamp to turn on um I think I was inspired by well [00:04:21] on um I think I was inspired by well Isaac azimov wrote these um robotic [00:04:24] Isaac azimov wrote these um robotic novel series and and all his robots Nam [00:04:26] novel series and and all his robots Nam started with r so maybe R's robot turn [00:04:29] started with r so maybe R's robot turn on [00:04:30] on um and so [00:04:33] um and so uh let's see so let's say that um you [00:04:36] uh let's see so let's say that um you are the new CEO of a small startup with [00:04:41] are the new CEO of a small startup with you know three persons uh and your goal [00:04:44] you know three persons uh and your goal is to build an is is to build a circuit [00:04:47] is to build an is is to build a circuit oh actually your goal is to build a [00:04:48] oh actually your goal is to build a learning algorithm um that can recognize [00:04:51] learning algorithm um that can recognize this phrase robit turn on uh so that [00:04:54] this phrase robit turn on uh so that when someone you know buys this lamp and [00:04:56] when someone you know buys this lamp and they say robit turn on that the lamp can [00:04:58] they say robit turn on that the lamp can turn on right and just focusing on the [00:05:01] turn on right and just focusing on the task of building a and then you know to [00:05:03] task of building a and then you know to to to be SE of this St you need to do a [00:05:05] to to be SE of this St you need to do a lot of things right you need to figure [00:05:06] lot of things right you need to figure out how to do the embeded cry figure out [00:05:08] out how to do the embeded cry figure out who the land makers the sales so there's [00:05:10] who the land makers the sales so there's all that stuff but for today let's just [00:05:12] all that stuff but for today let's just focus on the machine learning aspect of [00:05:14] focus on the machine learning aspect of it um and so my first question to you is [00:05:18] it um and so my first question to you is very open-ended is but and and this is [00:05:20] very open-ended is but and and this is the life of a CEO right you wake up one [00:05:22] the life of a CEO right you wake up one day and you just got to decide what to [00:05:23] day and you just got to decide what to do um but so my first question to you is [00:05:26] do um but so my first question to you is open-ended question is uh you're the co [00:05:29] open-ended question is uh you're the co uh you're going to show up at work uh [00:05:32] uh you're going to show up at work uh you know tomorrow in your startup office [00:05:34] you know tomorrow in your startup office and you want to build a learning [00:05:36] and you want to build a learning algorithm to detect the phrase Robert [00:05:38] algorithm to detect the phrase Robert turn on for this application right so um [00:05:42] turn on for this application right so um so my question is what are you going to [00:05:44] so my question is what are you going to do right so take a take a minute answer [00:05:46] do right so take a take a minute answer that by yourself first uh no don't don't [00:05:49] that by yourself first uh no don't don't discuss your neighbor yet but you know [00:05:50] discuss your neighbor yet but you know you're going to show up in your office [00:05:51] you're going to show up in your office and and then you're going to start [00:05:52] and and then you're going to start working on this enging problem to build [00:05:54] working on this enging problem to build a new network to do this so uh and and [00:05:57] a new network to do this so uh and and do this as yourself right don't don't [00:05:59] do this as yourself right don't don't don't pretend that you're this [00:06:01] don't pretend that you're this hypothetical [00:06:03] hypothetical whatever startup SE with 10 10 billion [00:06:06] whatever startup SE with 10 10 billion dollar to spend whatever just do it as [00:06:08] dollar to spend whatever just do it as let's say yeah I but I I don't think [00:06:10] let's say yeah I but I I don't think this is a terrible startup idea I I this [00:06:13] this is a terrible startup idea I I this is not the best idea but I think this [00:06:14] is not the best idea but I think this could work so you actually welcome to do [00:06:16] could work so you actually welcome to do this but let's say you decide to do this [00:06:18] this but let's say you decide to do this and you go into your office tomorrow [00:06:20] and you go into your office tomorrow like what do you do right why don't you [00:06:21] like what do you do right why don't you take um why don't you take let's say two [00:06:25] take um why don't you take let's say two minutes to enter an answer then we can [00:06:26] minutes to enter an answer then we can then we can discuss in fact I I think um [00:06:31] then we can discuss in fact I I think um yeah yes one thing I really like about [00:06:33] yeah yes one thing I really like about answer is actually the uh V is existing [00:06:36] answer is actually the uh V is existing literature part right um in fact when [00:06:38] literature part right um in fact when you're sting a new project um uh uh and [00:06:42] you're sting a new project um uh uh and I think um uh when you're sting a new [00:06:45] I think um uh when you're sting a new project like that assuming you've not [00:06:46] project like that assuming you've not worked on trigger word detection before [00:06:48] worked on trigger word detection before you know reading research papers or [00:06:50] you know reading research papers or reading cod in GitHub or reading blog [00:06:52] reading cod in GitHub or reading blog post on this problem actually very good [00:06:53] post on this problem actually very good way to quickly level up your knowledge [00:06:56] way to quickly level up your knowledge um and I think that you know it it turns [00:06:59] um and I think that you know it it turns that um uh in terms of your uh [00:07:03] that um uh in terms of your uh exploration strategy right um I want to [00:07:06] exploration strategy right um I want to describe to you how I read research [00:07:09] describe to you how I read research papers um uh which is so this is um not [00:07:14] papers um uh which is so this is um not a good way to review the literature [00:07:17] a good way to review the literature which is if the x-axis is time and the [00:07:20] which is if the x-axis is time and the vertical axis is research papers what [00:07:23] vertical axis is research papers what some people will do is find the first [00:07:24] some people will do is find the first research paper and read that until it's [00:07:28] research paper and read that until it's done and then go and find the second [00:07:30] done and then go and find the second research paper and read that until it's [00:07:32] research paper and read that until it's done and then go and find the third [00:07:34] done and then go and find the third research paper and just has this very [00:07:35] research paper and just has this very sequential way of um reading research [00:07:38] sequential way of um reading research papers and I find that the more [00:07:40] papers and I find that the more strategic way to to go through these [00:07:42] strategic way to to go through these resources everything ranging from block [00:07:45] resources everything ranging from block po um lots of good medium articles that [00:07:47] po um lots of good medium articles that explain things right uh research [00:07:51] explain things right uh research papers um right good Hub is if you use a [00:07:56] papers um right good Hub is if you use a parallel exploration process where this [00:07:59] parallel exploration process where this this is actually what it feels like when [00:08:00] this is actually what it feels like when I'm doing research on when I'm trying to [00:08:02] I'm doing research on when I'm trying to learn about a new field that I'm not [00:08:03] learn about a new field that I'm not that experted in right so I've actually [00:08:05] that experted in right so I've actually done a lot of work on trigger word [00:08:06] done a lot of work on trigger word detection but if I hadn't worked on this [00:08:08] detection but if I hadn't worked on this before then I would probably find you [00:08:10] before then I would probably find you know three papers so again x- axis is [00:08:12] know three papers so again x- axis is time and vertical AIS is different [00:08:14] time and vertical AIS is different papers and um you know read a few papers [00:08:18] papers and um you know read a few papers kind of in parallel at a surface level [00:08:20] kind of in parallel at a surface level and skim them and based on that you [00:08:23] and skim them and based on that you might decide to read that one in Greater [00:08:25] might decide to read that one in Greater detail and then to add other papers that [00:08:28] detail and then to add other papers that you start skimming [00:08:30] you start skimming and maybe find another one that you want [00:08:31] and maybe find another one that you want to read in great detail and then to [00:08:33] to read in great detail and then to gradually add new papers to your reading [00:08:36] gradually add new papers to your reading list uh and read some to confusion and [00:08:39] list uh and read some to confusion and some not to confusion um yeah I was [00:08:42] some not to confusion um yeah I was actually chatting with um uh uh one of [00:08:45] actually chatting with um uh uh one of my friends petb a former student uh at [00:08:47] my friends petb a former student uh at Berkeley who mentioned that he was [00:08:49] Berkeley who mentioned that he was wanting to learn about a new topic and [00:08:51] wanting to learn about a new topic and he he was uh he told me he was compiling [00:08:53] he he was uh he told me he was compiling a reading list of 200 research papers [00:08:56] a reading list of 200 research papers they want to read that sounds like a lot [00:08:57] they want to read that sounds like a lot you you rarely read 200 papers but so I [00:08:59] you you rarely read 200 papers but so I think if you read 10 papers you have a [00:09:02] think if you read 10 papers you have a basic understanding if you read 50 you [00:09:04] basic understanding if you read 50 you have a pretty decent understanding and [00:09:06] have a pretty decent understanding and if you read like a 100 I think you have [00:09:07] if you read like a 100 I think you have a very good understanding uh of a few [00:09:11] a very good understanding uh of a few but often this is time well spent I [00:09:14] but often this is time well spent I guess um and uh some other tips again [00:09:18] guess um and uh some other tips again this is I'm really thinking if you [00:09:20] this is I'm really thinking if you really are CEO of this startup and this [00:09:22] really are CEO of this startup and this is what you want to do what advice would [00:09:24] is what you want to do what advice would I give you um uh uh when you're reading [00:09:27] I give you um uh uh when you're reading papers uh other things to realize uh one [00:09:30] papers uh other things to realize uh one is that uh some papers don't make sense [00:09:33] is that uh some papers don't make sense right and that's fine uh uh you know [00:09:35] right and that's fine uh uh you know even I read some papers I just go no I [00:09:37] even I read some papers I just go no I don't think that makes sense uh and and [00:09:39] don't think that makes sense uh and and it's not that uncommon for us to uh find [00:09:42] it's not that uncommon for us to uh find papers from a decade ago that and we [00:09:45] papers from a decade ago that and we learned that half of it was great and [00:09:46] learned that half of it was great and the other half of it you know was really [00:09:48] the other half of it you know was really talk about things that were not that [00:09:50] talk about things that were not that important right so it's okay uh authors [00:09:53] important right so it's okay uh authors you know usually papers are technically [00:09:55] you know usually papers are technically accurate but often what they thought was [00:09:57] accurate but often what they thought was important like maybe an author thought [00:09:59] important like maybe an author thought that using Bashon was really important [00:10:01] that using Bashon was really important for this problem but it just turns out [00:10:02] for this problem but it just turns out not to be the case that that happens a [00:10:04] not to be the case that that happens a lot that happens sometimes um and I [00:10:06] lot that happens sometimes um and I think the other tactic that I see [00:10:08] think the other tactic that I see Stanford students sometimes not use [00:10:10] Stanford students sometimes not use enough is uh talking to experts [00:10:13] enough is uh talking to experts including contacting the authors so when [00:10:15] including contacting the authors so when I read the paper um uh I don't I I don't [00:10:19] I read the paper um uh I don't I I don't bother the authors unless I've actually [00:10:21] bother the authors unless I've actually like tried to figure it out myself right [00:10:23] like tried to figure it out myself right but if you actually spend some time [00:10:25] but if you actually spend some time trying to understand the paper and if it [00:10:27] trying to understand the paper and if it really doesn't make sense to you uh uh [00:10:29] really doesn't make sense to you uh uh uh is is is okay to email the authors [00:10:32] uh is is is okay to email the authors and see if they respond and and people [00:10:34] and see if they respond and and people are busy maybe there's a 50% chance of [00:10:36] are busy maybe there's a 50% chance of respond and that's okay because it takes [00:10:38] respond and that's okay because it takes you five minutes to write an email and [00:10:39] you five minutes to write an email and there's a 50% chance to get back to you [00:10:41] there's a 50% chance to get back to you that could be time pretty well spent uh [00:10:44] that could be time pretty well spent uh uh but but don't don't don't bother [00:10:46] uh but but don't don't don't bother people unless you try to do your own [00:10:47] people unless you try to do your own work I actually get a lot of emails from [00:10:50] work I actually get a lot of emails from you know high school students that that [00:10:51] you know high school students that that do not feel like they've done their own [00:10:53] do not feel like they've done their own work and and I just right and then right [00:10:56] work and and I just right and then right so so just don't don't don't bother [00:10:57] so so just don't don't don't bother people unless you've actually tried to [00:11:03] um cool so after um looking at the [00:11:08] um cool so after um looking at the literature uh and having a base maybe [00:11:11] literature uh and having a base maybe downloading a open source implementation [00:11:14] downloading a open source implementation or getting a sense of an Al you want to [00:11:15] or getting a sense of an Al you want to try oh and it turns out the trigger word [00:11:18] try oh and it turns out the trigger word detection literature is actually one [00:11:19] detection literature is actually one literature where there isn't consensus [00:11:21] literature where there isn't consensus on this is a good Al this is a bad Al [00:11:23] on this is a good Al this is a bad Al room right despite all the trigger word [00:11:25] room right despite all the trigger word or wake word detection systems that you [00:11:27] or wake word detection systems that you know some of you may use already [00:11:30] know some of you may use already uh there there there isn't actually [00:11:31] uh there there there isn't actually consensus in the in in the research for [00:11:34] consensus in the in in the research for me today on like this is the best Aver [00:11:36] me today on like this is the best Aver to try [00:11:37] to try um but so let's say that um you read [00:11:40] um but so let's say that um you read some papers downloaded some open source [00:11:42] some papers downloaded some open source implementations and now you want to [00:11:44] implementations and now you want to start training your first system right [00:11:47] start training your first system right last time we talked about this we talked [00:11:48] last time we talked about this we talked a little bit about how much time you [00:11:50] a little bit about how much time you would spend to collect data and and you [00:11:52] would spend to collect data and and you know we said spend a small amount of [00:11:54] know we said spend a small amount of time spend like a day or maybe two days [00:11:56] time spend like a day or maybe two days at most to collect your first data set [00:11:58] at most to collect your first data set to start training up a model though um [00:12:00] to start training up a model though um but my next question to you is what data [00:12:04] but my next question to you is what data would you collect [00:12:06] would you collect right um in particular what [00:12:10] right um in particular what train depth test [00:12:18] data would you collect so you've decided [00:12:21] data would you collect so you've decided on an initial neuron Network [00:12:23] on an initial neuron Network architecture and you want to train [00:12:25] architecture and you want to train something to recognize this space robit [00:12:27] something to recognize this space robit turn on uh I think there's [00:12:30] turn on uh I think there's uh probably I don't think it's possible [00:12:32] uh probably I don't think it's possible to download the data set I don't think [00:12:33] to download the data set I don't think anyone has collected the data set with [00:12:35] anyone has collected the data set with the words robit turn on and posted down [00:12:37] the words robit turn on and posted down on the internet so you have to collect [00:12:38] on the internet so you have to collect your own data for this particular [00:12:40] your own data for this particular trigger phrase that you want to use but [00:12:42] trigger phrase that you want to use but um you know as CEO of this startup [00:12:45] um you know as CEO of this startup trying to build a neonet to detect the [00:12:47] trying to build a neonet to detect the phrase robit turn [00:12:48] phrase robit turn on um what data do you collect right so [00:12:53] on um what data do you collect right so once you take once you're again take I [00:12:55] once you take once you're again take I don't know let's say three minutes to [00:12:57] don't know let's say three minutes to write an answer to this yeah I think [00:13:01] write an answer to this yeah I think this is an interesting one um Robert [00:13:04] this is an interesting one um Robert turn on over and over and then data [00:13:08] turn on over and over and then data augmentation um data augmentation is one [00:13:11] augmentation um data augmentation is one of those techniques that um uh is a way [00:13:14] of those techniques that um uh is a way to reduce uh variance in your learning [00:13:16] to reduce uh variance in your learning ALG because you're generating more data [00:13:19] ALG because you're generating more data and uh having worked on this problem I [00:13:22] and uh having worked on this problem I happen to know data augmentation works [00:13:24] happen to know data augmentation works you know is very useful for this problem [00:13:26] you know is very useful for this problem but if you didn't already know that fact [00:13:28] but if you didn't already know that fact this is is one of the things I would [00:13:30] this is is one of the things I would probably not do right away because I [00:13:32] probably not do right away because I would train a quick and dirty system [00:13:34] would train a quick and dirty system validate that you really have a high [00:13:36] validate that you really have a high variance problem before investing in the [00:13:38] variance problem before investing in the effort in data augmentation so data [00:13:40] effort in data augmentation so data augment is one of those techniques that [00:13:42] augment is one of those techniques that some you know like it never hurts it [00:13:44] some you know like it never hurts it rarely hurts usually helps but I don't [00:13:46] rarely hurts usually helps but I don't bother to make that investment unless [00:13:48] bother to make that investment unless you have collected the evidence that you [00:13:51] you have collected the evidence that you actually have a high variance problem [00:13:52] actually have a high variance problem and that this is actually a good use of [00:13:54] and that this is actually a good use of your time right [00:14:09] yeah I think this this one actually this [00:14:13] yeah I think this this one actually this is actually nice so um uh record [00:14:15] is actually nice so um uh record everyone started say Robert turn 100 [00:14:17] everyone started say Robert turn 100 times the really nice thing about that [00:14:19] times the really nice thing about that you can get it done really quickly um uh [00:14:23] you can get it done really quickly um uh when I'm working with teams um I [00:14:25] when I'm working with teams um I actually think in terms of hours in [00:14:27] actually think in terms of hours in terms of how long it take us to do do [00:14:29] terms of how long it take us to do do something so this one you could probably [00:14:31] something so this one you could probably do in like 30 minutes right so you get [00:14:33] do in like 30 minutes right so you get your data set collected in 30 minutes [00:14:35] your data set collected in 30 minutes and get going or or or or if you run [00:14:37] and get going or or or or if you run around Stanford and just ask you know [00:14:39] around Stanford and just ask you know friends or strangers to speak into your [00:14:42] friends or strangers to speak into your uh laptop microphone you spend a few [00:14:44] uh laptop microphone you spend a few hours to get a much bigger data set than [00:14:47] hours to get a much bigger data set than possible with startup so I probably do [00:14:48] possible with startup so I probably do that I probably actually go and collect [00:14:49] that I probably actually go and collect data in several hours rather than only [00:14:52] data in several hours rather than only spend 30 minutes but this is actually [00:14:53] spend 30 minutes but this is actually pretty interesting as well because let [00:14:54] pretty interesting as well because let you get it done really quickly that make [00:14:56] you get it done really quickly that make sense right so [00:15:03] um yeah so let me actually uh uh share [00:15:07] um yeah so let me actually uh uh share some more concrete advice right and and [00:15:09] some more concrete advice right and and I think actually some sometime back um [00:15:11] I think actually some sometime back um to to prepare a homework problem that [00:15:13] to to prepare a homework problem that you see later in this course Ken and [00:15:15] you see later in this course Ken and Unis and I we're actually you know [00:15:17] Unis and I we're actually you know building the system posy to to to create [00:15:19] building the system posy to to to create a homework right that that that you see [00:15:21] a homework right that that that you see later in this so this is like a uh this [00:15:24] later in this so this is like a uh this trigger word I think is a nice running [00:15:25] trigger word I think is a nice running example that we're using in a few points [00:15:27] example that we're using in a few points throughout this course um so here's one [00:15:30] throughout this course um so here's one thing you can do uh and this this is [00:15:33] thing you can do uh and this this is actually what um uh what we did right [00:15:36] actually what um uh what we did right which is uh [00:15:37] which is uh collect um well simplify a little bit [00:15:41] collect um well simplify a little bit [Music] [00:15:43] [Music] um collect 100 [00:15:48] examples of uh uh 10-second audio [00:15:55] clips right and so uh it turns out once [00:15:58] clips right and so uh it turns out once you grab a hold of someone uh and ask [00:16:01] you grab a hold of someone uh and ask them to speak into your microphone you [00:16:03] them to speak into your microphone you know you can keep them for um 3 seconds [00:16:06] know you can keep them for um 3 seconds which is how long it takes to say Rober [00:16:08] which is how long it takes to say Rober turn on or you can keep them for 10 [00:16:09] turn on or you can keep them for 10 seconds which they're actually very [00:16:11] seconds which they're actually very willing to spend an extra seven seconds [00:16:13] willing to spend an extra seven seconds with you right um but so if this is 10 [00:16:16] with you right um but so if this is 10 seconds of audio data you know so this [00:16:18] seconds of audio data you know so this is 10 seconds of audio right and and [00:16:20] is 10 seconds of audio right and and audio is just patterns of little changes [00:16:23] audio is just patterns of little changes in air pressure right so if you plot [00:16:24] in air pressure right so if you plot audio the reason it looks like this [00:16:26] audio the reason it looks like this waveform is just uh the the way you're [00:16:29] waveform is just uh the the way you're hearing my voice is you know my voice or [00:16:31] hearing my voice is you know my voice or the speakers are creating very rapid [00:16:32] the speakers are creating very rapid changes in air pressure and your ear [00:16:34] changes in air pressure and your ear measures those very rapid changes in air [00:16:36] measures those very rapid changes in air pressure interprets the sound and so a [00:16:38] pressure interprets the sound and so a microphone uh is a is a sensitive device [00:16:41] microphone uh is a is a sensitive device for recording these very very high [00:16:43] for recording these very very high frequency changes in air pressure and [00:16:45] frequency changes in air pressure and this plots that you see in audio is just [00:16:47] this plots that you see in audio is just what is the air pressure at different [00:16:48] what is the air pressure at different moments in time right but so given a um [00:16:52] moments in time right but so given a um a 10-second clip like this if this is [00:16:57] a 10-second clip like this if this is the 3C [00:16:59] the 3C section where they said Robert turn on [00:17:03] section where they said Robert turn on then what you would like to do is to [00:17:04] then what you would like to do is to build a desk slamp say they can sit here [00:17:08] build a desk slamp say they can sit here and the lamp is turned off turned off [00:17:10] and the lamp is turned off turned off turn off turn off turn off turn off and [00:17:14] turn off turn off turn off turn off and at the moment they finish saying Robert [00:17:16] at the moment they finish saying Robert turn on you know you turn it on so this [00:17:19] turn on you know you turn it on so this is the output label y really right and [00:17:23] is the output label y really right and then and then it's not detecting the [00:17:24] then and then it's not detecting the phas right so so so what you want to do [00:17:27] phas right so so so what you want to do for the trig word system is [00:17:29] for the trig word system is at you know pretty much the moment they [00:17:31] at you know pretty much the moment they finish saying Robert turn on uh you want [00:17:34] finish saying Robert turn on uh you want your learning algorithm to Output a one [00:17:37] your learning algorithm to Output a one that's your target label y saying yep I [00:17:39] that's your target label y saying yep I just heard this trigger word uh and for [00:17:41] just heard this trigger word uh and for all other times you want it to Output [00:17:43] all other times you want it to Output zero right because because uh and then [00:17:46] zero right because because uh and then the one is when you decide to turn on [00:17:48] the one is when you decide to turn on the lamp at that moment in time right so [00:17:52] the lamp at that moment in time right so to collect a data set um here's [00:17:54] to collect a data set um here's something you can do which is collect [00:17:59] something you can do which is collect 100 audio [00:18:02] clips of 10 seconds each and you know [00:18:07] clips of 10 seconds each and you know when I'm prioritizing my work or or my [00:18:09] when I'm prioritizing my work or or my team's work I would really you know look [00:18:11] team's work I would really you know look at these numbers and think okay let's [00:18:13] at these numbers and think okay let's say let's say actually if you're doing [00:18:15] say let's say actually if you're doing it let's say you're running around [00:18:16] it let's say you're running around Stanford and you want to collect 100 [00:18:19] Stanford and you want to collect 100 audio clips uh uh maybe 10 people 10 [00:18:23] audio clips uh uh maybe 10 people 10 Clips per person or maybe a 100 [00:18:25] Clips per person or maybe a 100 different people um I would actually [00:18:27] different people um I would actually estimate you know if you go to Stanford [00:18:30] estimate you know if you go to Stanford cafeteria uh how long does it take to [00:18:32] cafeteria uh how long does it take to get one person right you could probably [00:18:34] get one person right you could probably get one person every minute or two if [00:18:36] get one person every minute or two if you go to busy place on on like a [00:18:38] you go to busy place on on like a Stanford cafeteria so you could probably [00:18:40] Stanford cafeteria so you could probably get this done in like 100 to 200 minutes [00:18:42] get this done in like 100 to 200 minutes like two or three hours right it's not [00:18:44] like two or three hours right it's not that bad so you get this done quite [00:18:46] that bad so you get this done quite quickly [00:18:47] quickly um and so and and let's see collect 100 [00:18:50] um and so and and let's see collect 100 audio clips and actually for the for for [00:18:53] audio clips and actually for the for for the purposes of uh today let's say you [00:18:56] the purposes of uh today let's say you collect 100 audio clips to use for [00:19:01] training 25 for your Dev set [00:19:07] training 25 for your Dev set um and zero for the test set right it's [00:19:10] um and zero for the test set right it's actually not that uncommon if you're [00:19:12] actually not that uncommon if you're building a new product to just not have [00:19:14] building a new product to just not have a test set because your goal is to build [00:19:16] a test set because your goal is to build something that yach convinces you know [00:19:18] something that yach convinces you know just early prototyping phases of a [00:19:20] just early prototyping phases of a project sometimes I don't bother with a [00:19:22] project sometimes I don't bother with a test set if if you if it goes to [00:19:24] test set if if you if it goes to publisher paper then of course you need [00:19:25] publisher paper then of course you need a rigorously collected test set but if [00:19:27] a rigorously collected test set but if you're just building a product and you [00:19:28] you're just building a product and you don't need a rigorous evaluation [00:19:30] don't need a rigorous evaluation sometimes you can just get started [00:19:32] sometimes you can just get started without dealing with a test set right so [00:19:34] without dealing with a test set right so it's pretty e to get [00:19:36] it's pretty e to get started um and [00:19:44] [Applause] [00:19:54] then all right so taking that audio clip [00:19:58] then all right so taking that audio clip from [00:19:59] from above um one thing you can do to turn [00:20:03] above um one thing you can do to turn this into supervised learning problem um [00:20:05] this into supervised learning problem um is to take so you the the the phrase [00:20:08] is to take so you the the the phrase Robert turn on can be said in less than [00:20:10] Robert turn on can be said in less than 3 seconds so let's say you take 3 [00:20:12] 3 seconds so let's say you take 3 seconds as the duration of audio right [00:20:15] seconds as the duration of audio right so what you can do is uh clip out so [00:20:18] so what you can do is uh clip out so let's say here was when Robert turn on [00:20:20] let's say here was when Robert turn on was it so what you can do is um right [00:20:23] was it so what you can do is um right the [00:20:25] the taret 1 z z um what you can do is then [00:20:30] taret 1 z z um what you can do is then clip out different audio clips of 3 [00:20:32] clip out different audio clips of 3 seconds so here's one audio clip and you [00:20:36] seconds so here's one audio clip and you can take that audio clip this is X and [00:20:39] can take that audio clip this is X and the target label is zero because because [00:20:43] the target label is zero because because Robert turn on was not said um and you [00:20:46] Robert turn on was not said um and you can take I know this audio clip a [00:20:49] can take I know this audio clip a different randomly clipped 3 second clip [00:20:53] different randomly clipped 3 second clip and that clip also has the target label [00:20:57] and that clip also has the target label zero um and you know for this one right [00:21:02] zero um and you know for this one right which is a 3 second clip that come that [00:21:05] which is a 3 second clip that come that that that ends at the real on the last [00:21:08] that that ends at the real on the last part of the on sound you would have a [00:21:10] part of the on sound you would have a Target label of one right so and and uh [00:21:14] Target label of one right so and and uh when when when you learn about sequence [00:21:15] when when when you learn about sequence models or RNN you learn a better method [00:21:17] models or RNN you learn a better method than than this explicit clipping but for [00:21:19] than than this explicit clipping but for now let's say you take these um audio [00:21:22] now let's say you take these um audio clips and turn it into so take a [00:21:25] clips and turn it into so take a 10-second clip and by clipping out Rand [00:21:28] 10-second clip and by clipping out Rand different Windows you can take your um [00:21:31] different Windows you can take your um let's say 100 uh uh [00:21:34] let's say 100 uh uh clips and because for each 10-second [00:21:37] clips and because for each 10-second clip you can take different Windows you [00:21:39] clip you can take different Windows you could turn this into let's say uh [00:21:44] 3,000 training examples right so here I [00:21:47] 3,000 training examples right so here I took a 10-second clip and and and show [00:21:50] took a 10-second clip and and and show you know took three three different 3se [00:21:53] you know took three three different 3se second windows but if you take 30 [00:21:55] second windows but if you take 30 3second windows then each 10-second [00:21:57] 3second windows then each 10-second audio could becomes 30 examples and now [00:22:01] audio could becomes 30 examples and now you've turned the problem into a binary [00:22:02] you've turned the problem into a binary consecration problem where you need to [00:22:04] consecration problem where you need to train a neuron Network that inputs a 3 [00:22:07] train a neuron Network that inputs a 3 second clip and labels it as either zero [00:22:10] second clip and labels it as either zero or one right does make sense and so this [00:22:13] or one right does make sense and so this is an example of uh uh the the the more [00:22:16] is an example of uh uh the the the more complex uh pipelines you might have if [00:22:19] complex uh pipelines you might have if you're building a learning algorithm to [00:22:22] you're building a learning algorithm to take a continuous You Know audio [00:22:24] take a continuous You Know audio detection problem turn into the bind [00:22:26] detection problem turn into the bind classification problem which you've [00:22:27] classification problem which you've learned how to build various neuron [00:22:29] learned how to build various neuron networks for right and again when you [00:22:31] networks for right and again when you learn about RNs you learn about other [00:22:33] learn about RNs you learn about other ways to process sequence data or [00:22:34] ways to process sequence data or temporal data [00:22:36] temporal data okay so um go ahead theed right now is [00:22:42] okay so um go ahead theed right now is that manually La the data oh uh is this [00:22:46] that manually La the data oh uh is this manly lab yes I I would yeah actually if [00:22:50] manly lab yes I I would yeah actually if you have 100 examples um it's not that [00:22:52] you have 100 examples um it's not that hard to just listen to it you know on [00:22:54] hard to just listen to it you know on your laptop with some audio playing [00:22:57] your laptop with some audio playing software to figure out when when they [00:22:59] software to figure out when when they finish saying Robert turn on and then at [00:23:02] finish saying Robert turn on and then at that moment to put a one in the Target [00:23:05] that moment to put a one in the Target label right because this is really when [00:23:06] label right because this is really when you want the lamp to turn on right make [00:23:10] you want the lamp to turn on right make sense [00:23:12] sense cool so [00:23:15] cool so um any other questions actually feel you [00:23:17] um any other questions actually feel you to ask clarifying questions yeah go [00:23:18] to ask clarifying questions yeah go ahead um I wonder if this is going to [00:23:20] ahead um I wonder if this is going to cost the problem that um ones are two [00:23:23] cost the problem that um ones are two spars oh sure let me get back to that [00:23:27] spars oh sure let me get back to that sure anything else [00:23:29] sure anything else all right for a specific reason we only [00:23:32] all right for a specific reason we only train them with 3 [00:23:34] train them with 3 seconds the voice instead five like some [00:23:38] seconds the voice instead five like some people's voice oh I see yeah oh why do [00:23:41] people's voice oh I see yeah oh why do we do 3 seconds and four five seconds [00:23:42] we do 3 seconds and four five seconds there a yeah is there another hyper PR [00:23:44] there a yeah is there another hyper PR you can test so I think uh I don't [00:23:48] you can test so I think uh I don't know uh you you have to say it really [00:23:51] know uh you you have to say it really slowly to take I know right 3 seconds is [00:23:55] slowly to take I know right 3 seconds is this [00:23:56] this right a robit turn on right so again [00:24:01] right a robit turn on right so again it's it's a design Choice [00:24:04] it's it's a design Choice yeah [00:24:05] yeah yeah um all right so so um let's say you [00:24:10] yeah um all right so so um let's say you do this feed it to supervis learning [00:24:13] do this feed it to supervis learning algorithm training new network um and [00:24:16] algorithm training new network um and let's say that when you classify this uh [00:24:20] let's say that when you classify this uh when you run this algorithm you end up [00:24:22] when you run this algorithm you end up with uh [00:24:24] with uh 99.5% accuracy [00:24:28] right um uh but you find that the [00:24:33] right um uh but you find that the algorithm has zero [00:24:43] detections right um and and and and what [00:24:46] detections right um and and and and what I mean is that whatever audio you give [00:24:49] I mean is that whatever audio you give it it just outputs zero all the time so [00:24:52] it it just outputs zero all the time so the hour of them just says Nope I never [00:24:54] the hour of them just says Nope I never heard the phrase Robert turn on you know [00:24:56] heard the phrase Robert turn on you know so so so um [00:24:59] so so so um so uh and so my question to you is you [00:25:03] so uh and so my question to you is you know and by the way the reason I'm going [00:25:05] know and by the way the reason I'm going through these scenarios is um I found [00:25:08] through these scenarios is um I found that uh a good way to gain good [00:25:11] that uh a good way to gain good intuitions and and to become good at [00:25:13] intuitions and and to become good at making these decisions is these are the [00:25:15] making these decisions is these are the decisions that project leader right a [00:25:17] decisions that project leader right a tech leader or Co needs to make these [00:25:19] tech leader or Co needs to make these are actually like pretty much exactly [00:25:20] are actually like pretty much exactly the decisions you need to make and I [00:25:22] the decisions you need to make and I find that um one of the ways to gain [00:25:25] find that um one of the ways to gain this type of experience if you you know [00:25:27] this type of experience if you you know find a job with a good AI team and work [00:25:29] find a job with a good AI team and work with them for five years right and then [00:25:31] with them for five years right and then you actually live through this and you [00:25:32] you actually live through this and you see what they do but instead of needing [00:25:35] see what they do but instead of needing you to go and spend five years to see 10 [00:25:37] you to go and spend five years to see 10 examples of this I'm trying to step you [00:25:40] examples of this I'm trying to step you through maybe one example in in in one [00:25:42] through maybe one example in in in one hour so so instead of uh you know [00:25:45] hour so so instead of uh you know gaining this experience through work [00:25:48] gaining this experience through work experience which is great but takes many [00:25:50] experience which is great but takes many many years many many months uh hoping to [00:25:54] many years many many months uh hoping to you know let's just put you in the [00:25:55] you know let's just put you in the position of making these decisions you [00:25:56] position of making these decisions you can learn from that much faster right um [00:25:59] can learn from that much faster right um but [00:26:01] but so uh and and all the examples I'm [00:26:04] so uh and and all the examples I'm giving are actually completely realistic [00:26:05] giving are actually completely realistic right there either exactly or very [00:26:08] right there either exactly or very similar to things I have seen in in [00:26:10] similar to things I have seen in in actual you know very real projects so [00:26:13] actual you know very real projects so question is uh your learning album gives [00:26:15] question is uh your learning album gives this result 95% of aity zero detections [00:26:18] this result 95% of aity zero detections what do you do let me mention some of [00:26:21] what do you do let me mention some of some of the answers I really liked um I [00:26:24] some of the answers I really liked um I think that uh [00:26:26] think that uh um you know I when I think of building [00:26:29] um you know I when I think of building learning algorithms uh the process is [00:26:32] learning algorithms uh the process is often specify a depth set and or test [00:26:35] often specify a depth set and or test set that measure what you care about and [00:26:38] set that measure what you care about and then um you don't always have to do it [00:26:41] then um you don't always have to do it but it's good hygiene it just is it is [00:26:44] but it's good hygiene it just is it is um uh sharpens Clarity of your thinking [00:26:47] um uh sharpens Clarity of your thinking right if you have a very clear [00:26:48] right if you have a very clear specification of problem and I think one [00:26:51] specification of problem and I think one Insight out of this is that if your [00:26:52] Insight out of this is that if your death set is really out of whack right [00:26:54] death set is really out of whack right because it's so unbalanced that accuracy [00:26:56] because it's so unbalanced that accuracy in your death set doesn't transl relate [00:26:58] in your death set doesn't transl relate to what you actually care about uh [00:27:00] to what you actually care about uh because you know presumably it is 99.5% [00:27:03] because you know presumably it is 99.5% accurate on the dep set as well but this [00:27:05] accurate on the dep set as well but this performance is terrible so it's doing [00:27:06] performance is terrible so it's doing great on the depth set on your accuracy [00:27:08] great on the depth set on your accuracy mat but giving terrible performance so I [00:27:11] mat but giving terrible performance so I think of it as good hygiene you know [00:27:13] think of it as good hygiene you know it's kind of good sound practice uh to [00:27:16] it's kind of good sound practice uh to to just specify make sure you at least [00:27:18] to just specify make sure you at least have a death set and evaluation metric [00:27:20] have a death set and evaluation metric that corresponds more closely to what [00:27:21] that corresponds more closely to what you care about so making the dep set [00:27:24] you care about so making the dep set more balanc uh equal numbers of positive [00:27:26] more balanc uh equal numbers of positive and negative would would be good step to [00:27:28] and negative would would be good step to of that um uh and then I think um uh you [00:27:33] of that um uh and then I think um uh you could also uh there are a few people [00:27:36] could also uh there are a few people that talked about um give higher weights [00:27:39] that talked about um give higher weights to the positive examples right so you [00:27:42] to the positive examples right so you know uh uh one way to do this is to [00:27:44] know uh uh one way to do this is to resample your training and your dep sets [00:27:47] resample your training and your dep sets to make them more proportionate in terms [00:27:50] to make them more proportionate in terms of maybe closer to balance ratio [00:27:52] of maybe closer to balance ratio positive negative examples that' be okay [00:27:54] positive negative examples that' be okay the other way to not do resampling would [00:27:56] the other way to not do resampling would just give the positive examples a [00:27:57] just give the positive examples a greater weight right um I would probably [00:28:00] greater weight right um I would probably resample um another thing you could do [00:28:03] resample um another thing you could do uh uh uh you know in the in the interest [00:28:06] uh uh uh you know in the in the interest of um uh speed even if it's not the [00:28:10] of um uh speed even if it's not the mathematically most most sound thing to [00:28:12] mathematically most most sound thing to do is to change the target labels to be [00:28:15] do is to change the target labels to be a bunch of ones after that um uh and [00:28:19] a bunch of ones after that um uh and this is a hack this is not formally [00:28:21] this is a hack this is not formally rigorous but if you've implemented the [00:28:23] rigorous but if you've implemented the rest of this code already this might be [00:28:25] rest of this code already this might be a reasonable you know a little bit hacky [00:28:27] a reasonable you know a little bit hacky thing to do but this is this this this [00:28:29] thing to do but this is this this this might work well enough right I I would I [00:28:32] might work well enough right I I would I might not I don't know if I would want [00:28:34] might not I don't know if I would want to try to you write an academic research [00:28:36] to try to you write an academic research paper with this method maybe you get [00:28:38] paper with this method maybe you get away with it but this is all thing that [00:28:40] away with it but this is all thing that I think if you try to publish a paper [00:28:41] I think if you try to publish a paper with this academic reviewers might raise [00:28:43] with this academic reviewers might raise their eyebrows and say maybe you know [00:28:46] their eyebrows and say maybe you know maybe this is okay but I think if you [00:28:48] maybe this is okay but I think if you want something quick and dirty that just [00:28:50] want something quick and dirty that just works I think uh uh labeling the ones [00:28:53] works I think uh uh labeling the ones changing a bunch of labels to be on so [00:28:55] changing a bunch of labels to be on so that say a clip here [00:28:59] that say a clip here right uh that ends just a little bit [00:29:02] right uh that ends just a little bit after Robert turn on the still label one [00:29:04] after Robert turn on the still label one that would be pretty reasonable but this [00:29:05] that would be pretty reasonable but this would be saying that um uh for anywhere [00:29:10] would be saying that um uh for anywhere within maybe a 0.5 second period after [00:29:12] within maybe a 0.5 second period after Robert turn on finish it's okay to turn [00:29:14] Robert turn on finish it's okay to turn on the light anytime within that period [00:29:17] on the light anytime within that period that you kind of want to be turning on [00:29:19] that you kind of want to be turning on the light turning on the lamp you know [00:29:21] the light turning on the lamp you know say within half a second right after [00:29:24] say within half a second right after Robert turn on has has been said right [00:29:27] Robert turn on has has been said right like and this would be a not this would [00:29:29] like and this would be a not this would be a way to just get more labels of ones [00:29:32] be a way to just get more labels of ones in there right that make sense um um [00:29:37] in there right that make sense um um with like rebalancing your data sets [00:29:39] with like rebalancing your data sets like the class imbalance um how does [00:29:42] like the class imbalance um how does that translate to like when you deploy [00:29:44] that translate to like when you deploy this you're not going to see Robert turn [00:29:46] this you're not going to see Robert turn on as much right like one out of 1,000 [00:29:49] on as much right like one out of 1,000 might be reflective of what you expect [00:29:51] might be reflective of what you expect to see yeah this is going yeah right so [00:29:54] to see yeah this is going yeah right so um I think that uh how to put it um so [00:29:58] um I think that uh how to put it um so if you actually yes so well I uh this is [00:30:02] if you actually yes so well I uh this is sort of a depth set and evaluation [00:30:03] sort of a depth set and evaluation measure kind of question right so uh one [00:30:06] measure kind of question right so uh one of the couple of the metrics that people [00:30:08] of the couple of the metrics that people often use uh when actually working on [00:30:10] often use uh when actually working on this is um when someone says Robert turn [00:30:13] this is um when someone says Robert turn on what is the chance that actually she [00:30:15] on what is the chance that actually she wakes up or the lamp turns on and then [00:30:17] wakes up or the lamp turns on and then the second is if no one is saying [00:30:19] the second is if no one is saying anything to the lamp you know how often [00:30:22] anything to the lamp you know how often does it randomly turn on by itself [00:30:24] does it randomly turn on by itself without you having said anything so [00:30:25] without you having said anything so those are the two metrics people [00:30:27] those are the two metrics people actually use and and uh sometimes you [00:30:30] actually use and and uh sometimes you could also try to combine them a single [00:30:31] could also try to combine them a single number evaluation metric or something uh [00:30:34] number evaluation metric or something uh uh but I think that um uh you could then [00:30:36] uh but I think that um uh you could then Define a data set to measure both of [00:30:38] Define a data set to measure both of these things and and then and then [00:30:39] these things and and then and then hopefully find a way to combine them [00:30:41] hopefully find a way to combine them into single real number which I think [00:30:43] into single real number which I think yeah I think one of the ways we talked [00:30:44] yeah I think one of the ways we talked about in the in the videos as well right [00:30:47] about in the in the videos as well right does that make sense uh yeah but I think [00:30:49] does that make sense uh yeah but I think I think the question is really um uh [00:30:52] I think the question is really um uh what is it that satisfies a user need [00:30:54] what is it that satisfies a user need right yeah and oh and just one one thing [00:30:57] right yeah and oh and just one one thing about um the straightforward way of [00:30:59] about um the straightforward way of rebalancing is that if you don't do this [00:31:02] rebalancing is that if you don't do this then your whole data set just has very [00:31:04] then your whole data set just has very few positive examples right um and so if [00:31:08] few positive examples right um and so if you throw away all the negative examples [00:31:11] you throw away all the negative examples so that you cut down the number of [00:31:12] so that you cut down the number of negative examples until you have exactly [00:31:14] negative examples until you have exactly equal numbers of positive and negatives [00:31:16] equal numbers of positive and negatives you've actually thrown away a lot of [00:31:18] you've actually thrown away a lot of negative examples does this make sense [00:31:20] negative examples does this make sense and so one one one problem with the [00:31:22] and so one one one problem with the straightforward way of rebalancing is [00:31:23] straightforward way of rebalancing is that you know in your audio clip in your [00:31:26] that you know in your audio clip in your test 10 second second clip that we [00:31:28] test 10 second second clip that we collected by running around Stanford um [00:31:31] collected by running around Stanford um you have one example of robit turn on [00:31:35] you have one example of robit turn on and so if you want exactly perfectly [00:31:38] and so if you want exactly perfectly balanced positive and negative it means [00:31:40] balanced positive and negative it means that you're allowed to only clip out one [00:31:43] that you're allowed to only clip out one negative example all of this you can say [00:31:46] negative example all of this you can say that's a negative and that's a positive [00:31:48] that's a negative and that's a positive and you can't clip out more negative [00:31:50] and you can't clip out more negative examples from this right so so so if you [00:31:52] examples from this right so so so if you use a if you insist on the perfect [00:31:54] use a if you insist on the perfect rebalance you're actually throwing away [00:31:57] rebalance you're actually throwing away a lot of negative examples that that [00:31:59] a lot of negative examples that that could be helpful for the learning of [00:32:00] could be helpful for the learning of them [00:32:02] them right [00:32:04] right um [00:32:08] so all [00:32:11] so all right [00:32:13] right so [00:32:15] so um you know a lot of the workflow of uh [00:32:18] um you know a lot of the workflow of uh building learning algorithms is um uh [00:32:22] building learning algorithms is um uh building learning algorithms feels more [00:32:23] building learning algorithms feels more like debugging right because what [00:32:25] like debugging right because what happens in a typical machine learning [00:32:27] happens in a typical machine learning workflow is you implement something and [00:32:29] workflow is you implement something and it doesn't work so you figure out what [00:32:30] it doesn't work so you figure out what is the problem so you fix that uh uh [00:32:33] is the problem so you fix that uh uh like rebalancing or reweighting or [00:32:35] like rebalancing or reweighting or adding more once and so that fixes the [00:32:37] adding more once and so that fixes the current problem and then after fixing [00:32:40] current problem and then after fixing the current problem which which is the [00:32:42] the current problem which which is the one we just solved say you then come [00:32:44] one we just solved say you then come across a new problem and you have to [00:32:45] across a new problem and you have to solve that you fix that problem you come [00:32:47] solve that you fix that problem you come across another new problem so I find [00:32:49] across another new problem so I find that uh the workflow of um when I'm work [00:32:52] that uh the workflow of um when I'm work on a machine learning project it often [00:32:54] on a machine learning project it often feels more like software debugging than [00:32:56] feels more like software debugging than software development right because [00:32:58] software development right because you're often trying to figure out what [00:33:00] you're often trying to figure out what doesn't work and then trying to fix that [00:33:01] doesn't work and then trying to fix that and after you fix that problem then [00:33:03] and after you fix that problem then another Buck surfaces and you squash [00:33:05] another Buck surfaces and you squash that and you do that and another and you [00:33:06] that and you do that and another and you kind of keep doing that until the AL [00:33:08] kind of keep doing that until the AL works so if I keep talking about you [00:33:11] works so if I keep talking about you know your Al doesn't work what do you do [00:33:13] know your Al doesn't work what do you do next right that that's kind of the theme [00:33:14] next right that that's kind of the theme of today's presentation uh but that that [00:33:17] of today's presentation uh but that that is what the workflow that is what your [00:33:19] is what the workflow that is what your day-to-day work of developing a learning [00:33:21] day-to-day work of developing a learning Alum is usually like because it's like [00:33:23] Alum is usually like because it's like it doesn't work you fix it it still [00:33:25] it doesn't work you fix it it still doesn't work you fix that it still [00:33:26] doesn't work you fix that it still doesn't work you fix it and you do that [00:33:28] doesn't work you fix it and you do that enough times until it works right that [00:33:30] enough times until it works right that that that is actually what often working [00:33:31] that that is actually what often working on the learning out Works looks like [00:33:37] um all right so let's say you fix that [00:33:40] um all right so let's say you fix that problem um and you conclude uh through [00:33:45] problem um and you conclude uh through doing error analysis that your algorithm [00:33:47] doing error analysis that your algorithm is [00:33:50] overfitting right so you know you you've [00:33:53] overfitting right so you know you you've added a lot more ones so the data set is [00:33:55] added a lot more ones so the data set is a little bit more balanced so let's just [00:33:56] a little bit more balanced so let's just add a bunch of ones like I did on that [00:33:58] add a bunch of ones like I did on that previous board right let's just add a [00:34:00] previous board right let's just add a lot of ones here so the data set isn't [00:34:03] lot of ones here so the data set isn't as [00:34:04] as unbalanced and [00:34:08] um let's [00:34:17] see [00:34:22] um right okay good [00:34:26] um right okay good um let's say [00:34:30] that [00:34:36] sorry too many pages of notes [00:34:40] sorry too many pages of notes here okay good so let's say that um you [00:34:43] here okay good so let's say that um you find that it achieves now 98% [00:34:47] find that it achieves now 98% accuracy on training and 50% accuracy on [00:34:52] accuracy on training and 50% accuracy on the dep set right so very large gap [00:34:55] the dep set right so very large gap between your trading and your um death [00:34:58] between your trading and your um death set performance and so a clear sign of [00:35:00] set performance and so a clear sign of overfitting and so I think one of the [00:35:02] overfitting and so I think one of the earlier questions someone talked about [00:35:04] earlier questions someone talked about data augmentation uh and so when you [00:35:06] data augmentation uh and so when you have this clear sign of overfitting um [00:35:09] have this clear sign of overfitting um this is a good time to consider data [00:35:12] this is a good time to consider data augmentation right and and so let's say [00:35:14] augmentation right and and so let's say you go ahead and do data augmentation so [00:35:16] you go ahead and do data augmentation so for audio this is how you could do data [00:35:18] for audio this is how you could do data augmentation which is um collect a bunch [00:35:21] augmentation which is um collect a bunch of background [00:35:23] of background audio you know so I guess if you're [00:35:25] audio you know so I guess if you're trying to build a lamp that might go [00:35:27] trying to build a lamp that might go into people's homes then you could go [00:35:29] into people's homes then you could go into your friends's homes and uh you [00:35:31] into your friends's homes and uh you know with their permission record right [00:35:34] know with their permission record right what the background sound in their home [00:35:36] what the background sound in their home looks like you know maybe people talk in [00:35:37] looks like you know maybe people talk in the background maybe with the TV on in [00:35:39] the background maybe with the TV on in the background what whatever goes on [00:35:41] the background what whatever goes on people's homes um and then it turns out [00:35:44] people's homes um and then it turns out that if you take a um say a 1C [00:35:48] that if you take a um say a 1C clip of Robert turn on of [00:35:51] clip of Robert turn on of RTO and you add that to a background [00:35:56] RTO and you add that to a background clip then you can synthesize an audio [00:35:59] clip then you can synthesize an audio clip of what it sounds like in your [00:36:00] clip of what it sounds like in your friend's house if someone were to [00:36:02] friend's house if someone were to suddenly pop up and say rob it turn on [00:36:05] suddenly pop up and say rob it turn on against the background sound of your [00:36:06] against the background sound of your friend's house right um and and it turns [00:36:10] friend's house right um and and it turns out that um uh uh if you want to make [00:36:14] out that um uh uh if you want to make this system robust so actually for [00:36:16] this system robust so actually for example I have a I don't know I actually [00:36:20] example I have a I don't know I actually know someone that lives unfortunately [00:36:21] know someone that lives unfortunately closely to a train station and so their [00:36:23] closely to a train station and so their Halls actually has a lot of train [00:36:25] Halls actually has a lot of train station noise from the cow train uh and [00:36:27] station noise from the cow train uh and so so what you can do to make your [00:36:28] so so what you can do to make your system more robust is also uh take you [00:36:32] system more robust is also uh take you know a clip of say train noise right [00:36:35] know a clip of say train noise right like cow train noise and if you take [00:36:37] like cow train noise and if you take that noise and take a in this case let's [00:36:40] that noise and take a in this case let's say 1 second 1 second or 3 second clip [00:36:42] say 1 second 1 second or 3 second clip of someone saying rob a turn on and you [00:36:45] of someone saying rob a turn on and you synthesize that on top of the train in [00:36:47] synthesize that on top of the train in the background then what you end up with [00:36:49] the background then what you end up with is a 10-second clip of someone saying [00:36:52] is a 10-second clip of someone saying rob it turn on against a noisy you know [00:36:55] rob it turn on against a noisy you know train in the background type of type of [00:36:57] train in the background type of type of noise [00:36:58] noise right and so in order to do data [00:37:01] right and so in order to do data augmentation or data synthesis you can [00:37:04] augmentation or data synthesis you can take some one second clips of people [00:37:06] take some one second clips of people saying Robert turn on in the quiet [00:37:08] saying Robert turn on in the quiet background and then take some one second [00:37:10] background and then take some one second clip of people saying random words right [00:37:12] clip of people saying random words right let's say you know Cardinal right say a [00:37:16] let's say you know Cardinal right say a stford and synthesize this against the [00:37:18] stford and synthesize this against the train noise background and then you [00:37:20] train noise background and then you would have in this case you would have [00:37:22] would have in this case you would have what sounds like tray noise tray noise [00:37:24] what sounds like tray noise tray noise TR noise TR noise Robert turn on TR [00:37:26] TR noise TR noise Robert turn on TR noise TR [00:37:28] noise TR condos right and then uh you could [00:37:30] condos right and then uh you could generate the labels now as Zero's there [00:37:35] generate the labels now as Zero's there ones there and then Zer there right [00:37:38] ones there and then Zer there right because if this is what it actually [00:37:39] because if this is what it actually sounded like in a in a user's home then [00:37:42] sounded like in a in a user's home then um you want the lamp to turn on after [00:37:45] um you want the lamp to turn on after rob a turn on but not after these random [00:37:47] rob a turn on but not after these random words you can pick different random [00:37:49] words you can pick different random words [00:37:51] words right um [00:37:55] right um so let's see [00:38:00] right [00:38:08] so [00:38:10] so um what I'd like you to do is uh [00:38:14] um what I'd like you to do is uh evaluate um uh three different possible [00:38:19] evaluate um uh three different possible ways um to collect noisy data right uh [00:38:23] ways um to collect noisy data right uh to to to collect this type of background [00:38:26] to to to collect this type of background data right um and [00:38:30] data right um and so um what I like you to do for the next [00:38:33] so um what I like you to do for the next question is let's say you and your team [00:38:35] question is let's say you and your team you know have uh uh uh brainstormed um [00:38:39] you know have uh uh uh brainstormed um uh uh brainstormed a few different ways [00:38:43] uh uh brainstormed a few different ways uh to collect this type of background [00:38:45] uh to collect this type of background noise data um and let's say you've [00:38:48] noise data um and let's say you've decided that uh you would like to [00:38:51] decided that uh you would like to collect uh 10 hours of background noise [00:38:53] collect uh 10 hours of background noise data right so okay so I'm going to going [00:38:57] data right so okay so I'm going to going to present to you three options one is [00:39:04] um you know run around Stanford and [00:39:07] um you know run around Stanford and place microphones around Stanford or in [00:39:10] place microphones around Stanford or in your friends homes do this with consent [00:39:12] your friends homes do this with consent and don't don't you know California [00:39:14] and don't don't you know California actually you're not supposed to don't [00:39:15] actually you're not supposed to don't record people about their knowledge and [00:39:16] record people about their knowledge and consent right uh second is [00:39:23] uh downloads Clips online [00:39:28] uh downloads Clips online right uh it it turns out if you go to [00:39:30] right uh it it turns out if you go to YouTube there are these like 10hour long [00:39:33] YouTube there are these like 10hour long Clips uh of you know rain noise or cars [00:39:37] Clips uh of you know rain noise or cars driving around right so you actually uh [00:39:41] driving around right so you actually uh and again if you do that find something [00:39:42] and again if you do that find something that's Creative Commons and of [00:39:44] that's Creative Commons and of appropriately license right um another [00:39:47] appropriately license right um another thing you could do is uh use a [00:39:49] thing you could do is uh use a Mechanical [00:39:56] Turk and mechanical [00:39:59] Turk and mechanical tur we can have people all all around [00:40:02] tur we can have people all all around the world um be paid you know modest [00:40:05] the world um be paid you know modest amounts of money to submit audio clips [00:40:08] amounts of money to submit audio clips right so for the next exercise what I [00:40:10] right so for the next exercise what I want you to do because um and I want you [00:40:12] want you to do because um and I want you to have this exercise of of of this [00:40:14] to have this exercise of of of this discipline which is what I want you to [00:40:16] discipline which is what I want you to do is um I want you to estimate let's [00:40:19] do is um I want you to estimate let's see what time is it now okay it's 12:30 [00:40:23] see what time is it now okay it's 12:30 p.m. right now what I want you to do is [00:40:26] p.m. right now what I want you to do is uh write down three numbers in the next [00:40:29] uh write down three numbers in the next exercise to estimate if you were to do [00:40:33] exercise to estimate if you were to do this you know let's say you were to go [00:40:35] this you know let's say you were to go do this right now right by what time [00:40:38] do this right now right by what time will you have finished if you were to do [00:40:41] will you have finished if you were to do option one what time would you finish [00:40:43] option one what time would you finish you were to do option two what time [00:40:45] you were to do option two what time would you finish you were to do option [00:40:46] would you finish you were to do option three if your goal is to collect 10 [00:40:48] three if your goal is to collect 10 hours of data through one of these [00:40:50] hours of data through one of these mechanisms does that make sense so it's [00:40:52] mechanisms does that make sense so it's 12:30 p.m. now so what I like you to do [00:40:55] 12:30 p.m. now so what I like you to do is just write down three numbers [00:40:58] is just write down three numbers um first number is what time is it what [00:41:01] um first number is what time is it what time will it be by the time you [00:41:04] time will it be by the time you collected 10 hours of data you know from [00:41:07] collected 10 hours of data you know from around stand what time will it be right [00:41:09] around stand what time will it be right and and if you could do this in so so if [00:41:12] and and if you could do this in so so if if you think you do it by tonight then [00:41:14] if you think you do it by tonight then write 900 p.m. if you think it'll do if [00:41:16] write 900 p.m. if you think it'll do if you think it'll take you one week then [00:41:17] you think it'll take you one week then write the date one week from now right [00:41:19] write the date one week from now right whatever it is uh but just write down [00:41:21] whatever it is uh but just write down three numbers of these three activities [00:41:23] three numbers of these three activities okay let's want do this one relatively [00:41:26] okay let's want do this one relatively quickly can people do this in like a [00:41:28] quickly can people do this in like a maybe a minute and a [00:41:29] maybe a minute and a half all right cool this is interesting [00:41:39] um yeah what do people think actually [00:41:41] um yeah what do people think actually this surprisingly large variability I'll [00:41:44] this surprisingly large variability I'll mention one thing that um surprised me [00:41:48] mention one thing that um surprised me um I'll give you my own assessment I [00:41:50] um I'll give you my own assessment I think that [00:41:52] think that uh you know when I'm leading startup [00:41:55] uh you know when I'm leading startup teams we tend to be very Scrappy right [00:41:57] teams we tend to be very Scrappy right and so I think that um if it goes to [00:41:59] and so I think that um if it goes to collect 10 hours of data if you have [00:42:01] collect 10 hours of data if you have three friends with laptop you can [00:42:03] three friends with laptop you can collect three hours of data per hour [00:42:05] collect three hours of data per hour because you got three recordings going [00:42:06] because you got three recordings going in parallel so if I were doing this with [00:42:09] in parallel so if I were doing this with say two other friends you know I bet I [00:42:11] say two other friends you know I bet I bet we could get this done by tonight [00:42:14] bet we could get this done by tonight right uh uh because if you need nine [00:42:15] right uh uh because if you need nine hours of data that's each person needs [00:42:18] hours of data that's each person needs to collect three hours of data and you [00:42:20] to collect three hours of data and you run around Stanford and C the [00:42:21] run around Stanford and C the microphone's running I bet I bet I could [00:42:23] microphone's running I bet I bet I could get this done by 6 p.m. right maybe [00:42:26] get this done by 6 p.m. right maybe maybe even earlier I don't know [00:42:28] maybe even earlier I don't know um download Clips online uh is actually [00:42:32] um download Clips online uh is actually I don't know it's actually an [00:42:33] I don't know it's actually an interesting one maybe about the same [00:42:34] interesting one maybe about the same time um it turns out one tricky thing [00:42:37] time um it turns out one tricky thing about downloading Clips online is that [00:42:40] about downloading Clips online is that um uh I think a lot of the you there are [00:42:43] um uh I think a lot of the you there are people that um have trouble sleeping at [00:42:45] people that um have trouble sleeping at night so they listen to Highway noise or [00:42:47] night so they listen to Highway noise or whatever right and so there are these [00:42:50] whatever right and so there are these you know 20 hours of Highway Clips [00:42:52] you know 20 hours of Highway Clips Highway noise on YouTube that you can [00:42:54] Highway noise on YouTube that you can find but I I don't know how those were [00:42:57] find but I I don't know how those were generated and I suspect a lot of them [00:42:59] generated and I suspect a lot of them Loop right meaning it's the same one [00:43:01] Loop right meaning it's the same one hour play over and over so I actually [00:43:04] hour play over and over so I actually think it's harder than than than one [00:43:06] think it's harder than than than one might guess to get 10 hours of um [00:43:09] might guess to get 10 hours of um non-repetitive data and it's one of [00:43:11] non-repetitive data and it's one of those things you know if I take an R of [00:43:13] those things you know if I take an R of high highway sound and loop it you can't [00:43:16] high highway sound and loop it you can't tell the difference because all highway [00:43:17] tell the difference because all highway sound sounds the same I just can't tell [00:43:20] sound sounds the same I just can't tell one minute of Highway sound from another [00:43:21] one minute of Highway sound from another one but um if you have one hour of [00:43:23] one but um if you have one hour of Highway sound looped 10 times the [00:43:26] Highway sound looped 10 times the learning Alm wasy perform much less well [00:43:28] learning Alm wasy perform much less well than if you have 10 hours of fresh [00:43:30] than if you have 10 hours of fresh Highway sound so this I would actually [00:43:32] Highway sound so this I would actually have a harder time doing I think I [00:43:34] have a harder time doing I think I probably I I I would Pro if I were doing [00:43:36] probably I I I would Pro if I were doing this I because of these problems I would [00:43:39] this I because of these problems I would probably budget until sometime [00:43:41] probably budget until sometime tomorrow right may maybe maybe 9:00 p.m. [00:43:44] tomorrow right may maybe maybe 9:00 p.m. or something maybe that's doable I'm not [00:43:46] or something maybe that's doable I'm not sure um the one surprise to me was some [00:43:49] sure um the one surprise to me was some people thought they could do this by [00:43:50] people thought they could do this by tonight uh I again I've used Amazon [00:43:52] tonight uh I again I've used Amazon Mechanical it's actually a huge process [00:43:55] Mechanical it's actually a huge process to set up Amazon Mechanical get people [00:43:57] to set up Amazon Mechanical get people on board um and especially to get them [00:43:59] on board um and especially to get them microphone uh uh I don't know if you [00:44:01] microphone uh uh I don't know if you implement something on flash they can [00:44:02] implement something on flash they can speak in their web browser or and and [00:44:04] speak in their web browser or and and Flash isn't be supportive it's actually [00:44:07] Flash isn't be supportive it's actually so it's actually not that easy to get a [00:44:09] so it's actually not that easy to get a lot of turkers to do this and the global [00:44:11] lot of turkers to do this and the global supply of turkers is also unlimited so I [00:44:14] supply of turkers is also unlimited so I would if I were doing this I would [00:44:17] would if I were doing this I would probably I don't know maybe a week or [00:44:19] probably I don't know maybe a week or something right hard to say I'm not sure [00:44:21] something right hard to say I'm not sure um but so the specific opinion isn't [00:44:24] um but so the specific opinion isn't that important but I want you to go [00:44:26] that important but I want you to go through this excise because this is how [00:44:29] through this excise because this is how um efficient startup team should you [00:44:31] um efficient startup team should you know brainstorm a list of things and [00:44:33] know brainstorm a list of things and then you all figure out how long you [00:44:35] then you all figure out how long you think it'll take to do these things and [00:44:37] think it'll take to do these things and I think uh we can have a debate about [00:44:39] I think uh we can have a debate about how high quality the data is I think you [00:44:41] how high quality the data is I think you can get very high quality data from this [00:44:43] can get very high quality data from this and from this uh I I I just don't trust [00:44:47] and from this uh I I I just don't trust a lot of those online audio sources uh [00:44:49] a lot of those online audio sources uh but if this is really fast and you can [00:44:51] but if this is really fast and you can get pretty high quality data I would [00:44:53] get pretty high quality data I would probably do this to collect the [00:44:54] probably do this to collect the background sound to get going right but [00:44:57] background sound to get going right but I think that part of the workflow I see [00:44:59] I think that part of the workflow I see of you know fast moving teams is um [00:45:03] of you know fast moving teams is um pretty much exactly what you did which [00:45:05] pretty much exactly what you did which is why have that exercise of [00:45:06] is why have that exercise of brainstorming the list of options and [00:45:08] brainstorming the list of options and then really estimating oh what time can [00:45:10] then really estimating oh what time can we get this done and then use that to [00:45:12] we get this done and then use that to pick an option right um and then I want [00:45:16] pick an option right um and then I want to just mention one last thing um which [00:45:20] to just mention one last thing um which is [00:45:23] that these differences matter right um [00:45:28] that these differences matter right um you know I've actually I've built a lot [00:45:30] you know I've actually I've built a lot of speech system bu a lot of machine [00:45:32] of speech system bu a lot of machine learning systems but um oh and and I [00:45:34] learning systems but um oh and and I think by the way if you do everything we [00:45:36] think by the way if you do everything we just described and you see this later in [00:45:38] just described and you see this later in a problem Set uh you can actually with [00:45:41] a problem Set uh you can actually with this set of ideas pretty much this set [00:45:42] this set of ideas pretty much this set of ideas that we just went through today [00:45:44] of ideas that we just went through today you can actually put a build build a [00:45:46] you can actually put a build build a pretty decent trigger Weare detection [00:45:47] pretty decent trigger Weare detection system or wake word trigger detection [00:45:49] system or wake word trigger detection system in fact we ask to do pretty much [00:45:51] system in fact we ask to do pretty much this in the later homework exercise but [00:45:53] this in the later homework exercise but now you know when you get to that [00:45:55] now you know when you get to that homework exercise when you do RNN of you [00:45:58] homework exercise when you do RNN of you know how you could come up with this [00:45:59] know how you could come up with this sort of process yourself if if you [00:46:01] sort of process yourself if if you didn't already know how to make these [00:46:03] didn't already know how to make these types of choices yeah just one question [00:46:06] types of choices yeah just one question at what time of my research do I have [00:46:08] at what time of my research do I have like to think about like which SK of how [00:46:12] like to think about like which SK of how my micro will affect my results for at [00:46:15] my micro will affect my results for at the beginning I could think like it's [00:46:17] the beginning I could think like it's not important like my micro phone on the [00:46:20] not important like my micro phone on the light is the same as the one that is [00:46:21] light is the same as the one that is used when I run around St or when I [00:46:25] used when I run around St or when I download C but it might mess a lot my [00:46:27] download C but it might mess a lot my data so that's what point do I have to [00:46:30] data so that's what point do I have to think about it yeah so my advice so what [00:46:32] think about it yeah so my advice so what does your microphone affect your results [00:46:34] does your microphone affect your results right my my advice would be to uh get [00:46:37] right my my advice would be to uh get something going quick and dirty and then [00:46:40] something going quick and dirty and then uh develop a depth set right with the [00:46:42] uh develop a depth set right with the actual types of data you think you get [00:46:44] actual types of data you think you get on your real microphone and then see if [00:46:46] on your real microphone and then see if it is a problem and it may be different [00:46:48] it is a problem and it may be different microphones do have different [00:46:50] microphones do have different characteristics and if it is a problem [00:46:52] characteristics and if it is a problem then go back and think about how you [00:46:53] then go back and think about how you collect data that's more representative [00:46:55] collect data that's more representative of how you test okay I want to mention [00:46:58] of how you test okay I want to mention one more quick thing do I handle clost [00:46:59] one more quick thing do I handle clost surveys I want to do something real [00:47:01] surveys I want to do something real quick which is um I want to tell you why [00:47:03] quick which is um I want to tell you why these things really matter which is um [00:47:05] these things really matter which is um if this is a performance right let's say [00:47:09] if this is a performance right let's say actually let's say error and um this is [00:47:12] actually let's say error and um this is time right and if this is today and [00:47:16] time right and if this is today and you're the CE of this s remember that's [00:47:18] you're the CE of this s remember that's that's what we're doing in this lesson [00:47:20] that's what we're doing in this lesson and this is six months from now and this [00:47:22] and this is six months from now and this is 12 months from [00:47:23] is 12 months from now great um you know maybe of a [00:47:27] now great um you know maybe of a competitor actually maybe maybe I don't [00:47:30] competitor actually maybe maybe I don't know maybe because we talked about this [00:47:32] know maybe because we talked about this so much in this class maybe two of you [00:47:34] so much in this class maybe two of you in this going to build this thought up [00:47:36] in this going to build this thought up but but a competitor um but over time [00:47:41] but but a competitor um but over time most machine learning [00:47:42] most machine learning [Music] [00:47:44] [Music] teams you know the error actually goes [00:47:46] teams you know the error actually goes down over time as you work on problems [00:47:48] down over time as you work on problems right I mean this is what I see in tons [00:47:49] right I mean this is what I see in tons of practical projects you know we work [00:47:51] of practical projects you know we work on the project improve the system and [00:47:53] on the project improve the system and the error actually goes down over time [00:47:55] the error actually goes down over time as you work on this over the next 12 [00:47:57] as you work on this over the next 12 months say right if you're really see of [00:47:59] months say right if you're really see of a startup doing this and it turns out [00:48:01] a startup doing this and it turns out that is the startups have the discipline [00:48:03] that is the startups have the discipline to constantly be the most efficient um [00:48:06] to constantly be the most efficient um don't do something that takes you two [00:48:08] don't do something that takes you two days if you can get a similar result in [00:48:09] days if you can get a similar result in one day the difference is not that [00:48:11] one day the difference is not that you're one day slower the difference is [00:48:14] you're one day slower the difference is that you're 2x faster right and then and [00:48:16] that you're 2x faster right and then and having that mindset if we can take this [00:48:18] having that mindset if we can take this whole chart and compress it on the [00:48:20] whole chart and compress it on the horizontal axis um then [00:48:25] horizontal axis um then you want to be the startup that you know [00:48:28] you want to be the startup that you know makes the same amount of PRS in 6 months [00:48:29] makes the same amount of PRS in 6 months inste of 12 months right because uh if [00:48:33] inste of 12 months right because uh if you're able to do this then your startup [00:48:35] you're able to do this then your startup will actually perform much better in the [00:48:37] will actually perform much better in the marketplace assuming you know accuracy [00:48:39] marketplace assuming you know accuracy is important which it seems to be for [00:48:40] is important which it seems to be for Wake word and so don't think of this as [00:48:43] Wake word and so don't think of this as saving you a day here and there think of [00:48:45] saving you a day here and there think of this as making your team twice as fast [00:48:47] this as making your team twice as fast and that's the difference between this [00:48:48] and that's the difference between this level of performance and that level of [00:48:50] level of performance and that level of performance so that's why when I'm you [00:48:52] performance so that's why when I'm you know building teams and executing these [00:48:54] know building teams and executing these projects I tend to be pretty obsessive [00:48:56] projects I tend to be pretty obsessive about about uh making sure we're very [00:48:58] about about uh making sure we're very efficient in exploring the options and [00:49:00] efficient in exploring the options and don't wait till tomorrow to collect data [00:49:03] don't wait till tomorrow to collect data of dubious quality when you have a [00:49:04] of dubious quality when you have a better idea of collecting data by today [00:49:07] better idea of collecting data by today because the difference is not that you [00:49:08] because the difference is not that you wasted 12 hours the difference is you [00:49:10] wasted 12 hours the difference is you are twice as slow as a company right so [00:49:12] are twice as slow as a company right so I think uh so hopefully through this [00:49:14] I think uh so hopefully through this example and your ongoing experiences [00:49:16] example and your ongoing experiences throughout this qualter can help you [00:49:18] throughout this qualter can help you continue to get better at this right um [00:49:22] continue to get better at this right um last thing we want to do was uh we're [00:49:24] last thing we want to do was uh we're about halfway through the course go [00:49:25] about halfway through the course go ahead um we want to hand out a survey uh [00:49:28] ahead um we want to hand out a survey uh an anonymous survey uh to get some [00:49:31] an anonymous survey uh to get some feedback from you about this class and [00:49:33] feedback from you about this class and whenever we get these surveys uh we end [00:49:36] whenever we get these surveys uh we end up uh uh thanks to previous generations [00:49:39] up uh uh thanks to previous generations of students feedback we've already been [00:49:40] of students feedback we've already been gradually making class better so I think [00:49:43] gradually making class better so I think uh Ken and I actually read all of these [00:49:45] uh Ken and I actually read all of these questions ourselves and try to find ways [00:49:47] questions ourselves and try to find ways to take your feedback to improve the [00:49:49] to take your feedback to improve the class so uh if you can take you know [00:49:51] class so uh if you can take you know five minutes uh um f the survey and you [00:49:54] five minutes uh um f the survey and you can hand it in just drop it off [00:49:55] can hand it in just drop it off anonymously up here in front uh be very [00:49:58] anonymously up here in front uh be very grateful for your suggestions okay so [00:50:04] um I think if you haven't entered your [00:50:08] um I think if you haven't entered your ID yet uh you could still do so but uh [00:50:11] ID yet uh you could still do so but uh that's it for today so please follow the [00:50:12] that's it for today so please follow the survey and the anonymously just drop off [00:50:15] survey and the anonymously just drop off back and front then we'll wrap up okay [00:50:17] back and front then we'll wrap up okay thank you ================================================================================ LECTURE 007 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 7 - Interpretability of Neural Network Source: https://www.youtube.com/watch?v=gCJCgQW_LKc --- Transcript [00:00:04] hi everyone welcome to lecture number [00:00:07] hi everyone welcome to lecture number seven so up to now I believe can you [00:00:12] seven so up to now I believe can you hear me in the back is it easy okay so [00:00:15] hear me in the back is it easy okay so in the last set of module that you've [00:00:18] in the last set of module that you've seen you've learnt about convolutional [00:00:20] seen you've learnt about convolutional neural networks and how they can be [00:00:21] neural networks and how they can be applied to imaging notably [00:00:24] applied to imaging notably you've played with different types of [00:00:26] you've played with different types of layers including pooling max pooling [00:00:29] layers including pooling max pooling average pooling and convolutional layers [00:00:32] average pooling and convolutional layers you've also seen some classification [00:00:34] you've also seen some classification with the most classic algorithms all the [00:00:39] with the most classic algorithms all the way up to inception and resonance and [00:00:42] way up to inception and resonance and then you jumped into advanced [00:00:44] then you jumped into advanced application like object detection with [00:00:46] application like object detection with Yolo and the faster scan and faster our [00:00:48] Yolo and the faster scan and faster our CNN series with an optional video and [00:00:51] CNN series with an optional video and finally face recognition and your site [00:00:54] finally face recognition and your site transfer that we talked a little bit [00:00:55] transfer that we talked a little bit about in the past lectures so today [00:00:57] about in the past lectures so today we're going to build on top of [00:00:59] we're going to build on top of everything you've seen in this set of [00:01:00] everything you've seen in this set of modules to try to delve into the neural [00:01:02] modules to try to delve into the neural networks and interpret them because you [00:01:05] networks and interpret them because you you noticed after seeing the set of [00:01:08] you noticed after seeing the set of modules up to now that a lot of [00:01:10] modules up to now that a lot of improvements of the neural networks are [00:01:13] improvements of the neural networks are based on trial and error so we try [00:01:16] based on trial and error so we try something we do hyper parameter search [00:01:19] something we do hyper parameter search sometimes the model improves sometimes [00:01:21] sometimes the model improves sometimes it doesn't we use a validation set to [00:01:23] it doesn't we use a validation set to find the right set of methods that would [00:01:26] find the right set of methods that would make our model improve it's not [00:01:28] make our model improve it's not satisfactory from a scientific [00:01:29] satisfactory from a scientific standpoint so people are also searching [00:01:31] standpoint so people are also searching how can we find an effective way to [00:01:35] how can we find an effective way to improve our neural networks not only [00:01:37] improve our neural networks not only with trial and error but with theory [00:01:39] with trial and error but with theory that goes into the network and [00:01:41] that goes into the network and visualizations so today we will focus on [00:01:43] visualizations so today we will focus on that we first will see three methods [00:01:47] that we first will see three methods saliency Maps occlusion sensitivity and [00:01:50] saliency Maps occlusion sensitivity and class activation maps which are used to [00:01:53] class activation maps which are used to kind of understand what was the decision [00:01:56] kind of understand what was the decision process of the network given this output [00:01:59] process of the network given this output how can we map back the output decision [00:02:02] how can we map back the output decision on the input space to see which part of [00:02:05] on the input space to see which part of the inputs were discriminative for this [00:02:07] the inputs were discriminative for this output and later on we will delve even [00:02:10] output and later on we will delve even more in details into the network by [00:02:12] more in details into the network by looking at intermediate layers what [00:02:13] looking at intermediate layers what happens at an activation level at a [00:02:15] happens at an activation level at a layer level and at the net [00:02:17] layer level and at the net clever with another set of methods [00:02:19] clever with another set of methods gradient ascent class model [00:02:21] gradient ascent class model visualization data set search and [00:02:23] visualization data set search and deconvolution we will spend some times [00:02:26] deconvolution we will spend some times on the deconvolution because it's it's a [00:02:29] on the deconvolution because it's it's a cool it's a cool type of mathematical [00:02:31] cool it's a cool type of mathematical operation to know and it will give you [00:02:34] operation to know and it will give you more intuition on how the convolution [00:02:35] more intuition on how the convolution works from a mathematical perspective if [00:02:39] works from a mathematical perspective if we have time we go over a fun [00:02:40] we have time we go over a fun application called deep dream which is [00:02:43] application called deep dream which is super cool visuals for some of you who [00:02:45] super cool visuals for some of you who know it okay let's go menti code is on [00:02:49] know it okay let's go menti code is on the board if you guys need to sign up [00:02:50] the board if you guys need to sign up so as usual we go over some contextual [00:02:55] so as usual we go over some contextual information and in small case studies so [00:02:58] information and in small case studies so don't hesitate to participate so you've [00:03:00] don't hesitate to participate so you've built an animal classifier for a pet [00:03:02] built an animal classifier for a pet shop and you gave it to them it's it's [00:03:05] shop and you gave it to them it's it's super good it's been trained on imagenet [00:03:08] super good it's been trained on imagenet plus some other data and what what is a [00:03:11] plus some other data and what what is a little worrying is that the pet shop is [00:03:13] little worrying is that the pet shop is a little reluctant to use your network [00:03:15] a little reluctant to use your network because they don't understand the [00:03:17] because they don't understand the decision process of the model so how can [00:03:20] decision process of the model so how can you quickly show that the model is [00:03:23] you quickly show that the model is actually looking at a specific animal [00:03:25] actually looking at a specific animal let's say your cut if I give it an input [00:03:27] let's say your cut if I give it an input that is a cat we've seen that together [00:03:31] that is a cat we've seen that together one time already remembers so I go [00:03:34] one time already remembers so I go quickly you have a network here's a dog [00:03:37] quickly you have a network here's a dog given as an input to a CNN the CNN [00:03:40] given as an input to a CNN the CNN assuming the constraint is that there is [00:03:42] assuming the constraint is that there is one animal per image was trained with [00:03:44] one animal per image was trained with the softmax output layer and we get a [00:03:46] the softmax output layer and we get a probability distribution over all [00:03:47] probability distribution over all animals iguana dog car cats and in crap [00:03:52] animals iguana dog car cats and in crap and what we want is to take the [00:03:55] and what we want is to take the derivative of the score of dog and back [00:03:57] derivative of the score of dog and back propagated to the input to know which [00:03:58] propagated to the input to know which parts of the inputs were discriminative [00:04:01] parts of the inputs were discriminative for this score of dog does that make [00:04:04] for this score of dog does that make sense everybody remembers this and so [00:04:07] sense everybody remembers this and so the interesting part is that this value [00:04:09] the interesting part is that this value is the same shape as X so it's the size [00:04:12] is the same shape as X so it's the size of the input it's a matrix of numbers if [00:04:14] of the input it's a matrix of numbers if the numbers are large in absolute value [00:04:16] the numbers are large in absolute value it means the pixels corresponding to [00:04:18] it means the pixels corresponding to these locations had an impact on the [00:04:20] these locations had an impact on the score of duck okay what do you think the [00:04:24] score of duck okay what do you think the score of dog is easy the output [00:04:26] score of dog is easy the output probability R now what what do I mean by [00:04:30] probability R now what what do I mean by s of [00:04:40] yep score on the dog it's a score of the [00:04:43] yep score on the dog it's a score of the dog yeah but easy point 80 pipe that's [00:04:47] dog yeah but easy point 80 pipe that's what I need yes it's the the score that [00:04:56] what I need yes it's the the score that is pre soft max it's the score that [00:04:58] is pre soft max it's the score that comes before the soft max so as you [00:05:00] comes before the soft max so as you reminder here's a a soft max layer and [00:05:03] reminder here's a a soft max layer and this is how it could be presented so you [00:05:05] this is how it could be presented so you get us a vector that is a set of scores [00:05:07] get us a vector that is a set of scores that are not necessarily probabilities [00:05:08] that are not necessarily probabilities they're just scores between minus [00:05:10] they're just scores between minus infinity and plus infinity you give them [00:05:13] infinity and plus infinity you give them to the soft max and the soft max what is [00:05:15] to the soft max and the soft max what is going to do is that is going to output a [00:05:16] going to do is that is going to output a vector where the sum of all the [00:05:19] vector where the sum of all the probabilities in this vector are going [00:05:21] probabilities in this vector are going to sum up to one okay and so the issue [00:05:24] to sum up to one okay and so the issue is if instead of using the derivative of [00:05:27] is if instead of using the derivative of what we called Y hat last time we use [00:05:30] what we called Y hat last time we use the score of dog we will get a better [00:05:32] the score of dog we will get a better representation here the reason is in [00:05:35] representation here the reason is in order to maximize this number score of [00:05:38] order to maximize this number score of dog divided by the sum of the score of [00:05:40] dog divided by the sum of the score of all animals or like maybe I should write [00:05:44] all animals or like maybe I should write exponential of score of dog divided by [00:05:46] exponential of score of dog divided by sum of exponential of the score of all [00:05:48] sum of exponential of the score of all animals one way is to minimize the [00:05:53] animals one way is to minimize the scores of all the other animals rather [00:05:56] scores of all the other animals rather than maximizing the score of dog so you [00:05:59] than maximizing the score of dog so you see so maybe moving a certain pixel we [00:06:00] see so maybe moving a certain pixel we minimize the score of fish and so this [00:06:03] minimize the score of fish and so this pixel will have a high influence on Y [00:06:06] pixel will have a high influence on Y hats the general output of the network [00:06:08] hats the general output of the network but it actually doesn't have an [00:06:09] but it actually doesn't have an influence on the score of dog one layer [00:06:11] influence on the score of dog one layer before does it make sense so that's why [00:06:15] before does it make sense so that's why we would use the scores pre soft max [00:06:19] we would use the scores pre soft max instead of using the scores post soft [00:06:21] instead of using the scores post soft max that are the probabilities okay and [00:06:24] max that are the probabilities okay and what's fun is here you cannot see [00:06:26] what's fun is here you cannot see there's the sides are online if you want [00:06:28] there's the sides are online if you want to if you want to look at it on your [00:06:29] to if you want to look at it on your computers but you have some of the [00:06:31] computers but you have some of the pixels that are roughly the same [00:06:33] pixels that are roughly the same positions as the dog is on the input [00:06:35] positions as the dog is on the input image that are stronger so we see some [00:06:39] image that are stronger so we see some white pixels here and this can be used [00:06:42] white pixels here and this can be used to segment the dog probably [00:06:44] to segment the dog probably so you could use a simple trash holding [00:06:46] so you could use a simple trash holding to find where the dog was based on this [00:06:49] to find where the dog was based on this pixel pixel there is the pixel score map [00:06:55] pixel pixel there is the pixel score map doesn't work too well in practice so we [00:06:58] doesn't work too well in practice so we have better methods to do segmentation [00:06:59] have better methods to do segmentation but this can be done as well so this is [00:07:03] but this can be done as well so this is what is called salience see maps and [00:07:05] what is called salience see maps and it's a common technique to quickly [00:07:07] it's a common technique to quickly visualize what the network is looking at [00:07:10] visualize what the network is looking at in practice we will use other methods so [00:07:14] in practice we will use other methods so here's another contextual story now [00:07:18] here's another contextual story now you've built the animal classifier [00:07:19] you've built the animal classifier they're still little scared but you want [00:07:21] they're still little scared but you want to prove that the model is actually [00:07:22] to prove that the model is actually looking at the input image at the right [00:07:24] looking at the input image at the right position you don't need to be quick but [00:07:27] position you don't need to be quick but you have to be very precise ya know the [00:07:36] you have to be very precise ya know the saliency map is literally this thing [00:07:38] saliency map is literally this thing here is the values of the derivative so [00:07:47] here is the values of the derivative so you you you take the score of dog you [00:07:48] you you you take the score of dog you back propagate the gradient all the way [00:07:50] back propagate the gradient all the way to the input it gives you a matrix that [00:07:52] to the input it gives you a matrix that is exactly the same size as X and you [00:07:54] is exactly the same size as X and you use you use like a specific color scheme [00:07:57] use you use like a specific color scheme to see which pixels are the strongest [00:07:58] to see which pixels are the strongest thank you okay so here we have our CNN [00:08:04] thank you okay so here we have our CNN the dog is four propagated and you get a [00:08:07] the dog is four propagated and you get a score of probability score for the dog [00:08:10] score of probability score for the dog now you want a method that is more [00:08:12] now you want a method that is more precise than the previous one but not [00:08:14] precise than the previous one but not necessarily too fast and this one we've [00:08:16] necessarily too fast and this one we've talked about it a little bit it's [00:08:18] talked about it a little bit it's occlusion sensitivity so the idea here [00:08:20] occlusion sensitivity so the idea here is to put a gray square on the dog here [00:08:23] is to put a gray square on the dog here and we propagate this image with the [00:08:27] and we propagate this image with the gray square at this position through the [00:08:29] gray square at this position through the CNN what we get is another probability [00:08:31] CNN what we get is another probability distribution that is probably similar to [00:08:33] distribution that is probably similar to the one we had before because the gray [00:08:35] the one we had before because the gray square doesn't seem to impact too much [00:08:36] square doesn't seem to impact too much image it's at least from a human [00:08:39] image it's at least from a human perspective we still see a dog right so [00:08:42] perspective we still see a dog right so the score of dog might be high 83% [00:08:44] the score of dog might be high 83% probably what we can say is that we can [00:08:47] probably what we can say is that we can build a probability map corresponding to [00:08:50] build a probability map corresponding to the class dog and ha and we write down [00:08:53] the class dog and ha and we write down on this map how confident is the network [00:08:56] on this map how confident is the network if the gray square is that [00:08:58] if the gray square is that testing location so for our first [00:09:00] testing location so for our first location it seems that the network is [00:09:02] location it seems that the network is very confident so let's put a red square [00:09:04] very confident so let's put a red square here now I'm going to move the gray [00:09:06] here now I'm going to move the gray square a little bit I'm shifting it just [00:09:08] square a little bit I'm shifting it just as we do for convolution and I'm going [00:09:10] as we do for convolution and I'm going to send again this new image in the [00:09:13] to send again this new image in the network it's going to give me a new [00:09:16] network it's going to give me a new probability distribution output and the [00:09:18] probability distribution output and the score of dog might change so looking at [00:09:21] score of dog might change so looking at this score of dog I'm going to say ok [00:09:23] this score of dog I'm going to say ok the network is still very confident that [00:09:25] the network is still very confident that there is a dog here and I continue I [00:09:27] there is a dog here and I continue I shift it again here same networks still [00:09:29] shift it again here same networks still very confident that there is a dog [00:09:31] very confident that there is a dog now I shift the square vertically down [00:09:35] now I shift the square vertically down and I see that partial that the the face [00:09:39] and I see that partial that the the face of the dog is partially occluded [00:09:40] of the dog is partially occluded probability of dog will probably go down [00:09:43] probability of dog will probably go down because the network cannot see one eye [00:09:45] because the network cannot see one eye of the dog is not confident that there [00:09:47] of the dog is not confident that there is a dog anymore [00:09:49] is a dog anymore so probably the confidence of the [00:09:52] so probably the confidence of the network went down I'm going to put a [00:09:53] network went down I'm going to put a square that is tending to be blue and I [00:09:56] square that is tending to be blue and I continue I shift it again and here we [00:09:59] continue I shift it again and here we don't see the dog face anymore [00:10:00] don't see the dog face anymore so probably the network might might [00:10:03] so probably the network might might classify this as a chair right because [00:10:06] classify this as a chair right because the chair is more obvious than the dog [00:10:08] the chair is more obvious than the dog now and so the probability score of dog [00:10:10] now and so the probability score of dog might go down so I'm going to put a Blue [00:10:12] might go down so I'm going to put a Blue Square here and I'm going to continue [00:10:15] Square here and I'm going to continue here we don't see the tail of the dog [00:10:18] here we don't see the tail of the dog it's still fine the network is pretty [00:10:19] it's still fine the network is pretty confident and so on and what I will look [00:10:24] confident and so on and what I will look at now is this probability map which [00:10:26] at now is this probability map which tells me roughly where the dog is so [00:10:29] tells me roughly where the dog is so here we use the pretty big filter [00:10:30] here we use the pretty big filter compared to the size of the image the [00:10:32] compared to the size of the image the smaller the sorry the pretty big gray [00:10:34] smaller the sorry the pretty big gray square the smaller the gray square the [00:10:38] square the smaller the gray square the more precise this probability map is [00:10:40] more precise this probability map is going to be does that make sense so this [00:10:43] going to be does that make sense so this is if you have time if you can you can [00:10:46] is if you have time if you can you can take your time with the pet shot to [00:10:47] take your time with the pet shot to explain them what's happening you would [00:10:49] explain them what's happening you would do that yeah we will see that in the [00:10:59] do that yeah we will see that in the next slide that's correct [00:11:01] next slide that's correct so let's see more examples here we have [00:11:04] so let's see more examples here we have three classes and these these these [00:11:06] three classes and these these these images has been have been generated by [00:11:08] images has been have been generated by much as I learned Rob Fergus this paper [00:11:11] much as I learned Rob Fergus this paper visualizing and understanding [00:11:12] visualizing and understanding convolutional networks is one of the [00:11:16] convolutional networks is one of the seminal paper that has led the research [00:11:18] seminal paper that has led the research in in visualizing and interpreting [00:11:19] in in visualizing and interpreting neural networks so I'd advise you to [00:11:21] neural networks so I'd advise you to take a look at it and we will refer to [00:11:22] take a look at it and we will refer to it a lot of time in this lecture so now [00:11:25] it a lot of time in this lecture so now we have three examples one is a [00:11:27] we have three examples one is a pomeranian which is this type of cute [00:11:29] pomeranian which is this type of cute dog a car wheel which is the true class [00:11:32] dog a car wheel which is the true class of the second image and napkin hound [00:11:35] of the second image and napkin hound which is this type of dog here on the [00:11:37] which is this type of dog here on the last image so if you do the same thing [00:11:40] last image so if you do the same thing as we did before that's what you would [00:11:42] as we did before that's what you would see so just to clarify here we see a [00:11:47] see so just to clarify here we see a blue color it means when the gray square [00:11:49] blue color it means when the gray square was positioned here or centered at this [00:11:52] was positioned here or centered at this location the network was less confident [00:11:55] location the network was less confident that the true class was Pomeranian and [00:11:58] that the true class was Pomeranian and in fact if you look at the paper they [00:12:01] in fact if you look at the paper they explained that when the gray square was [00:12:03] explained that when the gray square was here the confidence of Pomeranian went [00:12:06] here the confidence of Pomeranian went down because the cumference because the [00:12:08] down because the cumference because the confidence of tennis ball went up and in [00:12:11] confidence of tennis ball went up and in fact the Pomeranian dog has a tennis [00:12:13] fact the Pomeranian dog has a tennis ball in the mouth another interesting [00:12:16] ball in the mouth another interesting thing to notice is on the last picture [00:12:18] thing to notice is on the last picture here you see that there is a red color [00:12:22] here you see that there is a red color on the top left of the image and this is [00:12:26] on the top left of the image and this is you exactly at what as what you [00:12:27] you exactly at what as what you mentioned Adam is that when the square [00:12:29] mentioned Adam is that when the square was on the face of the human the network [00:12:32] was on the face of the human the network was much more confident than the true [00:12:34] was much more confident than the true class that the true class was the dog [00:12:36] class that the true class was the dog because we removed a lot of meaningful [00:12:38] because we removed a lot of meaningful information for the network which was [00:12:40] information for the network which was the face of the human and similarly if [00:12:43] the face of the human and similarly if you put the square on the dog the true [00:12:46] you put the square on the dog the true class that the network was outputting [00:12:48] class that the network was outputting was human problem that makes sense ok so [00:12:54] was human problem that makes sense ok so this is called occlusion sensitivity and [00:12:56] this is called occlusion sensitivity and it's the second method that you now have [00:12:59] it's the second method that you now have seen for interpreting word the network [00:13:03] seen for interpreting word the network looks at on an input so let's move to [00:13:08] looks at on an input so let's move to class activation Maps so I know if you [00:13:10] class activation Maps so I know if you remember but two weeks ago [00:13:11] remember but two weeks ago Pranav when he discussed the techniques [00:13:14] Pranav when he discussed the techniques that he has you [00:13:15] that he has you in healthcare he explained that you get [00:13:18] in healthcare he explained that you get a he get a chest x-ray and he manages to [00:13:23] a he get a chest x-ray and he manages to to tell the doctor where the network is [00:13:26] to tell the doctor where the network is looking at when predicting a certain [00:13:28] looking at when predicting a certain disease based on this chest x-ray right [00:13:31] disease based on this chest x-ray right remember that so this was done through [00:13:33] remember that so this was done through class activation labs and that's what [00:13:35] class activation labs and that's what we're going to see now so one important [00:13:39] we're going to see now so one important thing to notice is that we discussed [00:13:41] thing to notice is that we discussed that classification networks seem to [00:13:45] that classification networks seem to have a very good localization ability [00:13:47] have a very good localization ability and we can see it with the two methods [00:13:50] and we can see it with the two methods that we previously discussed same thing [00:13:52] that we previously discussed same thing for those of you who have read the [00:13:54] for those of you who have read the yellow paper that you've studied in this [00:13:56] yellow paper that you've studied in this set of modules the yellow v2 algorithm [00:13:59] set of modules the yellow v2 algorithm has first been trained on classification [00:14:01] has first been trained on classification because classification has a lot of data [00:14:03] because classification has a lot of data a lot more than object detection has [00:14:06] a lot more than object detection has been trained on classification built a [00:14:08] been trained on classification built a very good localization ability and then [00:14:11] very good localization ability and then has been fine-tuned and retrained on [00:14:12] has been fine-tuned and retrained on object detection data sets okay and so [00:14:16] object detection data sets okay and so the core idea of class activation map is [00:14:18] the core idea of class activation map is to show that CNN's have a very good [00:14:22] to show that CNN's have a very good localization ability even if they were [00:14:24] localization ability even if they were trained only on image level labels so we [00:14:28] trained only on image level labels so we have this Network there is a very [00:14:30] have this Network there is a very classic Network used for classification [00:14:33] classic Network used for classification we give it a kid and a dog this class [00:14:37] we give it a kid and a dog this class activation map is coming from MIT MIT [00:14:40] activation map is coming from MIT MIT lab with Balaji at all in 2016 and you [00:14:44] lab with Balaji at all in 2016 and you for propagate this image of a kid with a [00:14:46] for propagate this image of a kid with a dog through the network which has some [00:14:48] dog through the network which has some comic spool classic series of layers [00:14:51] comic spool classic series of layers several of them and at the end you [00:14:53] several of them and at the end you usually flatten the last output volume [00:14:56] usually flatten the last output volume of the comp and run it through several [00:14:58] of the comp and run it through several fully connected layer which are going to [00:15:01] fully connected layer which are going to play the role of a classifier and send [00:15:03] play the role of a classifier and send it to a soft Max and get the probability [00:15:05] it to a soft Max and get the probability output now what we're going to do is [00:15:08] output now what we're going to do is that we're going to prove that this CNN [00:15:10] that we're going to prove that this CNN is generalizing to localization so we're [00:15:13] is generalizing to localization so we're going to convert this same network in [00:15:16] going to convert this same network in another network and the part which is [00:15:18] another network and the part which is going to change is only the last part [00:15:20] going to change is only the last part the downside of using flattened plus [00:15:23] the downside of using flattened plus fully connected is that you lose all [00:15:26] fully connected is that you lose all spatial information right [00:15:29] spatial information right you have a volume that has spatial [00:15:31] you have a volume that has spatial information although it's been gone [00:15:33] information although it's been gone through some max pooling so it's been [00:15:34] through some max pooling so it's been down sampled and you lost some part of [00:15:36] down sampled and you lost some part of the special localization flattening [00:15:38] the special localization flattening kills it you flatten it you run it [00:15:40] kills it you flatten it you run it through a fully connected layer and then [00:15:42] through a fully connected layer and then it's over you it's it's super hard to [00:15:44] it's over you it's it's super hard to find out where the activation was [00:15:46] find out where the activation was corresponds to on the input space so [00:15:50] corresponds to on the input space so instead of using flattened possibly [00:15:51] instead of using flattened possibly connected we're going to use global [00:15:53] connected we're going to use global average pooling we're going to explain [00:15:55] average pooling we're going to explain what it is a fully connected softmax [00:15:58] what it is a fully connected softmax layer and get the probability output and [00:15:59] layer and get the probability output and we're going to show that now this [00:16:01] we're going to show that now this network can be trained very quickly [00:16:04] network can be trained very quickly because we just need to train one layer [00:16:06] because we just need to train one layer the fully connected here and can show [00:16:08] the fully connected here and can show where the network looks at the same as [00:16:11] where the network looks at the same as the previous network so let's talk about [00:16:13] the previous network so let's talk about it more in detail assume this was the [00:16:16] it more in detail assume this was the last complex and it outputs a volume a [00:16:20] last complex and it outputs a volume a volume that is sized to simplify 4 by 4 [00:16:23] volume that is sized to simplify 4 by 4 by 6 so 6 filters were used in the last [00:16:27] by 6 so 6 filters were used in the last comp and so we have 6 feature Maps now [00:16:30] comp and so we have 6 feature Maps now that make sense I'm going to convert [00:16:33] that make sense I'm going to convert this using a global average pooling to [00:16:35] this using a global average pooling to just a vector of 6 values what is global [00:16:38] just a vector of 6 values what is global average pooling is just taking these [00:16:40] average pooling is just taking these feature Maps each of them averaging them [00:16:42] feature Maps each of them averaging them into one number so now instead of having [00:16:45] into one number so now instead of having a 4 by 4 by 6 volume I have a 1 by 1 by [00:16:49] a 4 by 4 by 6 volume I have a 1 by 1 by 6 volume but we can call it a vector now [00:16:53] 6 volume but we can call it a vector now that makes sense [00:16:54] that makes sense so what's interesting is that this [00:16:56] so what's interesting is that this number actually holds the information of [00:16:59] number actually holds the information of the whole feature map that came before [00:17:01] the whole feature map that came before in one number being averaged over it I'm [00:17:05] in one number being averaged over it I'm going to put these in a vector and I'm [00:17:08] going to put these in a vector and I'm going to call them activations as usual [00:17:10] going to call them activations as usual a 1 a 2 a 3 a 4 a 5 a 6 as I said I'm [00:17:15] a 1 a 2 a 3 a 4 a 5 a 6 as I said I'm going to train a fully connected layer [00:17:17] going to train a fully connected layer here with the softmax activation and the [00:17:20] here with the softmax activation and the outputs are going to be the [00:17:21] outputs are going to be the probabilities so what is interesting [00:17:24] probabilities so what is interesting about that is that the feature Maps here [00:17:28] about that is that the feature Maps here as you know will contain some visual [00:17:30] as you know will contain some visual patterns so if I look at the first [00:17:32] patterns so if I look at the first feature map I can plot it here so these [00:17:35] feature map I can plot it here so these are the values and of course this one is [00:17:38] are the values and of course this one is much more granular than 4 by 4 it's not [00:17:40] much more granular than 4 by 4 it's not a 4 by 4 it's much more [00:17:42] a 4 by 4 it's much more but this you can say that this is the [00:17:44] but this you can say that this is the feature map and it seems that the [00:17:46] feature map and it seems that the activations have found something here [00:17:48] activations have found something here there was a visual pattern in the input [00:17:50] there was a visual pattern in the input that activated the feature map and the [00:17:53] that activated the feature map and the filters which generated this feature map [00:17:54] filters which generated this feature map here in this location same for the [00:17:57] here in this location same for the second one there is probably two objects [00:18:00] second one there is probably two objects or two patterns that activated the [00:18:03] or two patterns that activated the filters that generated this feature map [00:18:05] filters that generated this feature map and so on so we have six of those and [00:18:09] and so on so we have six of those and after I've trained my fully connected [00:18:12] after I've trained my fully connected layers here my fully connected layer I [00:18:14] layers here my fully connected layer I look at the score of dog score of dog is [00:18:17] look at the score of dog score of dog is 91% what I can do is to know this 91% [00:18:23] 91% what I can do is to know this 91% how much did it come from these feature [00:18:26] how much did it come from these feature maps and how can I know it is because [00:18:29] maps and how can I know it is because now I have a direct mapping using the [00:18:30] now I have a direct mapping using the weights I know that the weight number [00:18:33] weights I know that the weight number one here this edge you see it is how [00:18:37] one here this edge you see it is how much this score was dependent on the [00:18:40] much this score was dependent on the orange feature map that make sense the [00:18:46] orange feature map that make sense the second weight if you look at the green [00:18:48] second weight if you look at the green edge is the weights that has multiplied [00:18:52] edge is the weights that has multiplied this feature map to give birth to the [00:18:56] this feature map to give birth to the output of a dog so this weight is [00:18:58] output of a dog so this weight is telling me how much this feature map the [00:19:01] telling me how much this feature map the green one has influence on the output [00:19:03] green one has influence on the output that make sense so now what I can do is [00:19:07] that make sense so now what I can do is to sum all of this a weighted sum of all [00:19:11] to sum all of this a weighted sum of all these feature Maps and if I just do this [00:19:13] these feature Maps and if I just do this weighted sum I will get another feature [00:19:15] weighted sum I will get another feature map something like that and you notice [00:19:19] map something like that and you notice that this one seems to be highly [00:19:21] that this one seems to be highly influenced by the green one the green [00:19:24] influenced by the green one the green feature map yeah it means probably the [00:19:26] feature map yeah it means probably the weight here was higher it probably means [00:19:32] weight here was higher it probably means that the second filter of the last comp [00:19:35] that the second filter of the last comp was the one that was looking at the dog [00:19:39] was the one that was looking at the dog that make sense okay and then once I get [00:19:44] that make sense okay and then once I get this feature map this feature map is not [00:19:46] this feature map this feature map is not the size of the input image right it's [00:19:48] the size of the input image right it's the size of the height and width of the [00:19:52] the size of the height and width of the output of the last comp so the only [00:19:54] output of the last comp so the only thing I'm going to do is like I'm going [00:19:55] thing I'm going to do is like I'm going to up Sam [00:19:56] to up Sam back simply so that it fits the size of [00:19:59] back simply so that it fits the size of the input image and I'm going to overlay [00:20:01] the input image and I'm going to overlay it on the input image to get my class [00:20:04] it on the input image to get my class activation the reason it's called class [00:20:06] activation the reason it's called class activation map is because this feature [00:20:08] activation map is because this feature map is dependent on the class you're [00:20:11] map is dependent on the class you're talking about if I was using let's say I [00:20:15] talking about if I was using let's say I was using car here if I was using car [00:20:19] was using car here if I was using car the weights would have been different [00:20:21] the weights would have been different right look at the edges that connect the [00:20:24] right look at the edges that connect the first activation to the activation of [00:20:26] first activation to the activation of the previous layer these weights are [00:20:28] the previous layer these weights are different so if I sum all of these [00:20:30] different so if I sum all of these feature Maps I'm going to get something [00:20:31] feature Maps I'm going to get something else does that make sense so this is [00:20:35] else does that make sense so this is class activation mass and in fact there [00:20:41] class activation mass and in fact there is a dog here and there's a human there [00:20:42] is a dog here and there's a human there and what you can notice is probably if I [00:20:45] and what you can notice is probably if I look at the class of human the weights [00:20:47] look at the class of human the weights number one might be very high because it [00:20:51] number one might be very high because it seems that this visual pattern that [00:20:52] seems that this visual pattern that activated the first feature map was the [00:20:55] activated the first feature map was the face of the kid okay so what is super [00:21:00] face of the kid okay so what is super cool is that you can get your network [00:21:02] cool is that you can get your network and just change the last few layers into [00:21:04] and just change the last few layers into global average pooling plus the softmax [00:21:06] global average pooling plus the softmax fully connected layer and you can do [00:21:08] fully connected layer and you can do that and visualize very well it requires [00:21:10] that and visualize very well it requires a small fine tuning yeah so it's a [00:21:19] a small fine tuning yeah so it's a different vocabulary I would use [00:21:20] different vocabulary I would use failures see maps for the [00:21:22] failures see maps for the backpropagation up to the pixels and [00:21:23] backpropagation up to the pixels and class activation maps related to one [00:21:26] class activation maps related to one class it's not a back propagation at all [00:21:30] class it's not a back propagation at all it's just not sampling to the to the [00:21:33] it's just not sampling to the to the input space based on the feature maps of [00:21:35] input space based on the feature maps of the last compilation mostly just [00:21:39] the last compilation mostly just examining the weights that are doing [00:21:40] examining the weights that are doing like a max operation not so much of a [00:21:44] like a max operation not so much of a background obligation yes any other [00:21:48] background obligation yes any other questions on class activation maps that [00:21:51] questions on class activation maps that it's not [00:21:53] it's not yeah that's a good question so taking [00:21:55] yeah that's a good question so taking the average does it kill the spatial [00:21:57] the average does it kill the spatial information so let me let me write down [00:21:59] information so let me let me write down a formula here this is the score that [00:22:01] a formula here this is the score that we're interested in let's say dog class [00:22:04] we're interested in let's say dog class see what you could say is that this [00:22:06] see what you could say is that this score is a sum of k equal 1 to 6 fw k [00:22:12] score is a sum of k equal 1 to 6 fw k which is the the weight that that [00:22:15] which is the the weight that that connects the output activation to the [00:22:17] connects the output activation to the previous layer times what times a of the [00:22:22] previous layer times what times a of the previous layer let's say we use a [00:22:25] previous layer let's say we use a notation that is like k is the chase [00:22:27] notation that is like k is the chase feature map and i j is the location and [00:22:33] feature map and i j is the location and I sum that over the locations can you [00:22:37] I sum that over the locations can you see in the back roughly so what I'm [00:22:40] see in the back roughly so what I'm saying is that here I have my global [00:22:42] saying is that here I have my global average pooling that happened here and I [00:22:44] average pooling that happened here and I can divide it by the certain number so [00:22:46] can divide it by the certain number so divided by 16 4x4 okay I can switch the [00:22:54] divided by 16 4x4 okay I can switch the two sums so I can say that this thing is [00:22:56] two sums so I can say that this thing is a sum over I J the locations times sum [00:23:03] a sum over I J the locations times sum over k equals 1 to 6 of what W K times H [00:23:13] over k equals 1 to 6 of what W K times H a so the activations of the case feature [00:23:15] a so the activations of the case feature map in position H I J and times the [00:23:19] map in position H I J and times the normalization 116 doesn't make sense [00:23:26] does this make sense so I still have the [00:23:29] does this make sense so I still have the location I still moved [00:23:32] location I still moved I still move the sum around and what I [00:23:34] I still move the sum around and what I could do is to say that this thing is [00:23:38] could do is to say that this thing is the score in location IJ of the class [00:23:48] the score in location IJ of the class activation up is the class score for [00:23:50] activation up is the class score for this location IJ and I'm summing it over [00:23:53] this location IJ and I'm summing it over all locations so just by flipping what [00:23:57] all locations so just by flipping what the average pooling was doing over the [00:23:58] the average pooling was doing over the locations I can say that by weighting [00:24:02] locations I can say that by weighting using my weights all the activation [00:24:06] using my weights all the activation in a specific location for all the [00:24:07] in a specific location for all the feature maps I can get the score of this [00:24:10] feature maps I can get the score of this position in regards to the final output [00:24:14] position in regards to the final output does that make sense so we were not [00:24:19] does that make sense so we were not losing the the spatial information the [00:24:24] losing the the spatial information the reason we're not losing it is because we [00:24:27] reason we're not losing it is because we know we know what the feature maps are [00:24:29] know we know what the feature maps are right we know what they are and we know [00:24:32] right we know what they are and we know that they've been averaged exactly so we [00:24:34] that they've been averaged exactly so we exactly can map it back because we [00:24:42] exactly can map it back because we assume that each filter that generated [00:24:45] assume that each filter that generated this feature Maps detects one one [00:24:47] this feature Maps detects one one specific thing so like if if this is the [00:24:52] specific thing so like if if this is the feature map it means assuming the filter [00:24:55] feature map it means assuming the filter was detecting dog that we're going to [00:24:57] was detecting dog that we're going to see just just something here meaning [00:25:01] see just just something here meaning that there is a dog here and if there [00:25:03] that there is a dog here and if there was a dog on the lower part of the image [00:25:05] was a dog on the lower part of the image we would also have strong activations in [00:25:07] we would also have strong activations in this part I'd say if you want to see [00:25:15] this part I'd say if you want to see more of the math behind it check the [00:25:17] more of the math behind it check the papers but this is the intuition behind [00:25:20] papers but this is the intuition behind it you can flip the summations using the [00:25:23] it you can flip the summations using the global average pooling and show that you [00:25:24] global average pooling and show that you keep the spatial information the thing [00:25:29] keep the spatial information the thing is you do the global average pooling but [00:25:30] is you do the global average pooling but you don't lose the future maps because [00:25:32] you don't lose the future maps because you know where they were from the output [00:25:34] you know where they were from the output of the count right so you're not you're [00:25:36] of the count right so you're not you're not deleting this information that make [00:25:38] not deleting this information that make sense yeah the average yeah okay let's [00:25:51] sense yeah the average yeah okay let's move on and watch a full video on how a [00:25:54] move on and watch a full video on how a class activation not work [00:25:55] class activation not work this video was from Kylie McDonald [00:26:01] and it's it's life so it's very quick so [00:26:10] and it's it's life so it's very quick so you can see that the network is looking [00:26:12] you can see that the network is looking at the speedboat okay so now the three [00:26:30] at the speedboat okay so now the three methods we've seen are methods that are [00:26:35] methods we've seen are methods that are roughly mapping back the output to the [00:26:38] roughly mapping back the output to the input space and helping of visualize [00:26:40] input space and helping of visualize which part of the inputs were the most [00:26:41] which part of the inputs were the most discriminative to lead to this output [00:26:44] discriminative to lead to this output and the decision of the network now [00:26:46] and the decision of the network now we're going to try to delve more into [00:26:48] we're going to try to delve more into details in the in the in the [00:26:50] details in the in the in the intermediate layers of the network and [00:26:52] intermediate layers of the network and try to interpret how does the network [00:26:54] try to interpret how does the network see our world not necessarily related to [00:26:57] see our world not necessarily related to a specific input but in general okay so [00:27:03] a specific input but in general okay so the pet shop now trust your model [00:27:05] the pet shop now trust your model because you've used origin sensitivity [00:27:07] because you've used origin sensitivity salient see map and class actuation maps [00:27:09] salient see map and class actuation maps to show that the model is looking at the [00:27:10] to show that the model is looking at the right place but they got a little scared [00:27:13] right place but they got a little scared when you did that and they asked you to [00:27:15] when you did that and they asked you to explain what the model thinks a dog is [00:27:20] explain what the model thinks a dog is so you have this trained convolutional [00:27:22] so you have this trained convolutional neural network and you have an output [00:27:25] neural network and you have an output probability yep let me take one non [00:27:34] probability yep let me take one non image data that's that's a good question [00:27:36] image data that's that's a good question it's actually so the reason we're seeing [00:27:38] it's actually so the reason we're seeing images what most of the research has [00:27:39] images what most of the research has been focusing on images if you look at [00:27:43] been focusing on images if you look at electric time series data [00:27:45] electric time series data so either speech or natural language the [00:27:48] so either speech or natural language the main way to visualize those is with the [00:27:51] main way to visualize those is with the attention method are you familiar with [00:27:54] attention method are you familiar with that so in the next set of modules that [00:27:56] that so in the next set of modules that you're going to start this week and [00:27:57] you're going to start this week and you're going to study in the next two [00:27:59] you're going to study in the next two weeks you will see a visualization [00:28:00] weeks you will see a visualization method called attention models which [00:28:03] method called attention models which will tell you which part of a sentence [00:28:05] will tell you which part of a sentence was important let's say to output a [00:28:08] was important let's say to output a number like assuming you're doing [00:28:11] number like assuming you're doing machine translation you know some [00:28:13] machine translation you know some languages they don't have a direct [00:28:14] languages they don't have a direct one-to-one mapping it means I might say [00:28:16] one-to-one mapping it means I might say I love cats but in another language [00:28:20] I love cats but in another language maybe this same sentence would be [00:28:22] maybe this same sentence would be attached I love or something it's fit [00:28:24] attached I love or something it's fit and you want an attention model to seek [00:28:27] and you want an attention model to seek to show you that the cat was referring [00:28:29] to show you that the cat was referring to the second I think it's okay sorry [00:28:33] to the second I think it's okay sorry guys [00:28:38] so going back to the presentation now [00:28:41] so going back to the presentation now we're going to delve into inside the [00:28:44] we're going to delve into inside the network and so the new thing is the pet [00:28:47] network and so the new thing is the pet shop is little scared and ask you to [00:28:49] shop is little scared and ask you to explain what the network think a dog is [00:28:51] explain what the network think a dog is what's the representation of dog for the [00:28:53] what's the representation of dog for the network so here we're going to use a [00:28:55] network so here we're going to use a method that we've already seen together [00:28:56] method that we've already seen together called gradient ascent which is defining [00:29:00] called gradient ascent which is defining an objective that is technically the [00:29:04] an objective that is technically the score of the dog - a regularization term [00:29:07] score of the dog - a regularization term what the regularization term is doing is [00:29:09] what the regularization term is doing is it's saying that X should look natural [00:29:11] it's saying that X should look natural it's not necessarily l2 regularization [00:29:13] it's not necessarily l2 regularization can be something else and we will [00:29:16] can be something else and we will discuss it in the next slide but don't [00:29:18] discuss it in the next slide but don't think about it right now what we will do [00:29:20] think about it right now what we will do is we will compute the back propagation [00:29:22] is we will compute the back propagation of this objective function all the way [00:29:24] of this objective function all the way back to the input and perform gradient [00:29:27] back to the input and perform gradient ascent to find the image that maximizes [00:29:29] ascent to find the image that maximizes the score of the dog so it's an [00:29:31] the score of the dog so it's an iterative process takes longer than the [00:29:33] iterative process takes longer than the class activation map and we repeat the [00:29:37] class activation map and we repeat the process forward propagate X compute the [00:29:39] process forward propagate X compute the objective back propagate and update the [00:29:41] objective back propagate and update the pixels and so on you guys are familiar [00:29:42] pixels and so on you guys are familiar with that so let's see what what what we [00:29:45] with that so let's see what what what we can visualize doing that so actually if [00:29:48] can visualize doing that so actually if you take an image net classification [00:29:50] you take an image net classification network and you perform this on the [00:29:52] network and you perform this on the classes of goose or ostrich or Kitfox [00:29:54] classes of goose or ostrich or Kitfox Husky Dalmatians you can see what the [00:29:57] Husky Dalmatians you can see what the network is looking at or what the [00:29:59] network is looking at or what the network think that almassian is so for [00:30:02] network think that almassian is so for the Dalmatian you can see some some [00:30:04] the Dalmatian you can see some some black dots on a white background somehow [00:30:07] black dots on a white background somehow but these are still quite hard to [00:30:10] but these are still quite hard to interpret it's not super easy to see and [00:30:12] interpret it's not super easy to see and even worse here on the screen better on [00:30:15] even worse here on the screen better on your computers but you can see a fox [00:30:18] your computers but you can see a fox some here you can see orange color for [00:30:21] some here you can see orange color for the fox it means that pushing the pixels [00:30:23] the fox it means that pushing the pixels to an orange color would actually lead [00:30:25] to an orange color would actually lead to a higher score of the kid fox in the [00:30:27] to a higher score of the kid fox in the output [00:30:28] output if you use a better regularization than [00:30:31] if you use a better regularization than l2 you might get better pictures so this [00:30:34] l2 you might get better pictures so this is for flamingo this is for Pelican and [00:30:36] is for flamingo this is for Pelican and this is for Hartley's so a few things [00:30:39] this is for Hartley's so a few things that are interesting to see is that in [00:30:40] that are interesting to see is that in order to maximize the score of flamingo [00:30:42] order to maximize the score of flamingo what the network visualized is many [00:30:46] what the network visualized is many flamingos it means that ten flamingos [00:30:49] flamingos it means that ten flamingos leads to a higher score of the class [00:30:51] leads to a higher score of the class salmon go than one flamingo for the [00:30:53] salmon go than one flamingo for the network talking about regularization [00:30:57] network talking about regularization what does l2 regularization say it says [00:31:00] what does l2 regularization say it says that for visualising we don't want to [00:31:02] that for visualising we don't want to have extreme values of pixel it doesn't [00:31:04] have extreme values of pixel it doesn't help much to have one pixel with an [00:31:06] help much to have one pixel with an extreme value one pixel with a low value [00:31:09] extreme value one pixel with a low value and so on so we're going to regularize [00:31:10] and so on so we're going to regularize all the pixels so that all the values [00:31:12] all the pixels so that all the values are around each other and then we can [00:31:14] are around each other and then we can rescale it between 0 and 20 255 if you [00:31:17] rescale it between 0 and 20 255 if you want one thing to notice is that the [00:31:19] want one thing to notice is that the gradient ascent process doesn't [00:31:22] gradient ascent process doesn't constrain the inputs to be between 0 and [00:31:26] constrain the inputs to be between 0 and 255 you can go to plus infinity [00:31:29] 255 you can go to plus infinity potentially while an image is stored [00:31:32] potentially while an image is stored with numbers between 0 and 255 so you [00:31:34] with numbers between 0 and 255 so you might want to clip that as well this is [00:31:36] might want to clip that as well this is another type of regularization one thing [00:31:38] another type of regularization one thing that led to beautiful pictures was what [00:31:41] that led to beautiful pictures was what Jason your sinski and his team did is [00:31:44] Jason your sinski and his team did is they for propagated an image computed [00:31:48] they for propagated an image computed the score computed the objective [00:31:50] the score computed the objective function back propagated updated the [00:31:53] function back propagated updated the pixels and blurred them blurred the [00:31:55] pixels and blurred them blurred the picture because what what is not useful [00:31:58] picture because what what is not useful for visualizing is if you have high [00:32:00] for visualizing is if you have high frequency variation between pixels it [00:32:02] frequency variation between pixels it doesn't help to visualize if you have [00:32:04] doesn't help to visualize if you have many pixels close to each other that [00:32:06] many pixels close to each other that have many different values instead you [00:32:08] have many different values instead you want to have a smooth transition among [00:32:10] want to have a smooth transition among pixels and this is another type of [00:32:12] pixels and this is another type of regularization called Gaussian blur Inc [00:32:15] regularization called Gaussian blur Inc ok so this method actually makes a lot [00:32:19] ok so this method actually makes a lot of sense in in in scientific terms [00:32:22] of sense in in in scientific terms you're you're maximizing an objective [00:32:24] you're you're maximizing an objective function that gives you what the network [00:32:26] function that gives you what the network sees as flamingo which would maximize [00:32:28] sees as flamingo which would maximize the score of flamingo [00:32:29] the score of flamingo so we call it also class model [00:32:32] so we call it also class model visualization yes [00:32:41] a more realistic class model [00:32:45] a more realistic class model visualization correspond to more [00:32:47] visualization correspond to more accurate so it's hard to map the [00:32:49] accurate so it's hard to map the accuracy of the model based on this [00:32:51] accuracy of the model based on this visualization it's a good way to [00:32:53] visualization it's a good way to validate that the network is looking at [00:32:55] validate that the network is looking at the right thing yeah we're going to see [00:32:58] the right thing yeah we're going to see more of this later I think the most [00:33:01] more of this later I think the most interesting part is actually on this [00:33:03] interesting part is actually on this slide is we did it for the class score [00:33:06] slide is we did it for the class score but we could have done it with any [00:33:07] but we could have done it with any activation so let's say I stopped in the [00:33:10] activation so let's say I stopped in the middle of the network and I define my [00:33:12] middle of the network and I define my objective function to be this activation [00:33:15] objective function to be this activation I'm going to back propagate and find the [00:33:18] I'm going to back propagate and find the input that will maximize this activation [00:33:20] input that will maximize this activation it will tell me what is this activation [00:33:22] it will tell me what is this activation what does this activation fire for so [00:33:26] what does this activation fire for so that's even more interesting I think [00:33:28] that's even more interesting I think than looking at the input and then yep [00:33:30] than looking at the input and then yep does that make sense that we could do it [00:33:32] does that make sense that we could do it on any activation yep any questions on [00:33:41] on any activation yep any questions on that [00:33:44] okay so now we're going to do another [00:33:49] okay so now we're going to do another trick which is dataset search it's [00:33:51] trick which is dataset search it's actually one of the most useful I think [00:33:53] actually one of the most useful I think not fast but very useful [00:33:55] not fast but very useful so the petrov loved the previous [00:33:57] so the petrov loved the previous technique and asks if there are other [00:33:59] technique and asks if there are other alternatives to to show what what an [00:34:02] alternatives to to show what what an activation in the middle of a network is [00:34:05] activation in the middle of a network is thinking you take an image for [00:34:08] thinking you take an image for propagated through the network get your [00:34:10] propagated through the network get your output now what you're going to do is [00:34:13] output now what you're going to do is select a feature map let's say this one [00:34:16] select a feature map let's say this one we're at this layer and the feature map [00:34:18] we're at this layer and the feature map is of size 5x5 by 256 it means that the [00:34:22] is of size 5x5 by 256 it means that the complan had 256 filters right you're [00:34:29] complan had 256 filters right you're going to look at these feature maps and [00:34:32] going to look at these feature maps and select probably yeah what you're going [00:34:36] select probably yeah what you're going to do select one of the feature maps [00:34:38] to do select one of the feature maps okay we select one out of 256 it feature [00:34:42] okay we select one out of 256 it feature map and we're going to run a lot of data [00:34:45] map and we're going to run a lot of data for propagated to the network and look [00:34:47] for propagated to the network and look which data points have had the maximum [00:34:51] which data points have had the maximum activation of this feature map so let's [00:34:55] activation of this feature map so let's say we do it with the first feature map [00:34:57] say we do it with the first feature map we notice that these are the top five [00:34:59] we notice that these are the top five images that really fired this feature [00:35:03] images that really fired this feature map like high activations on the fibula [00:35:06] map like high activations on the fibula what it tells us is that's probably this [00:35:08] what it tells us is that's probably this feature map is detecting shirts could do [00:35:12] feature map is detecting shirts could do the same thing let's say we take the [00:35:14] the same thing let's say we take the second feature map and we look which [00:35:17] second feature map and we look which data points have maximized the [00:35:20] data points have maximized the activations of this feature map out of a [00:35:22] activations of this feature map out of a lot of data and we see that this is what [00:35:25] lot of data and we see that this is what we got the top five images probably [00:35:28] we got the top five images probably means that the other feature map seems [00:35:30] means that the other feature map seems to be activated when seeing edges so the [00:35:36] to be activated when seeing edges so the second one is much more likely to appear [00:35:38] second one is much more likely to appear earlier in the network obviously than [00:35:40] earlier in the network obviously than later so one thing that you may ask is [00:35:44] later so one thing that you may ask is this images sim crop like I don't think [00:35:47] this images sim crop like I don't think that this was an image in the dataset is [00:35:49] that this was an image in the dataset is probably a sub part of the image what do [00:35:53] probably a sub part of the image what do you think this crop corresponds to [00:36:05] any idea how we cropped the image and [00:36:09] any idea how we cropped the image and why these are cropped [00:36:16] like what why didn't I show you the full [00:36:19] like what why didn't I show you the full images how was I able to show you the [00:36:22] images how was I able to show you the cropped back anything that's correct [00:36:33] cropped back anything that's correct so let's say we pick an activation an [00:36:37] so let's say we pick an activation an activation in the network this [00:36:39] activation in the network this activation for a convolutional neural [00:36:41] activation for a convolutional neural network often time doesn't see the [00:36:43] network often time doesn't see the entire input image right doesn't see it [00:36:47] entire input image right doesn't see it what it sees is a subspace of the inputs [00:36:51] what it sees is a subspace of the inputs image that make sense so let's look at [00:36:56] image that make sense so let's look at another slide here we have a picture of [00:36:57] another slide here we have a picture of unis 64 by 64 by 3 it's our inputs we [00:37:01] unis 64 by 64 by 3 it's our inputs we run it through a five layer confidence [00:37:03] run it through a five layer confidence and now we get an encoding volume that [00:37:06] and now we get an encoding volume that is much smaller in height and width but [00:37:09] is much smaller in height and width but bigger in depth if I tell you what this [00:37:13] bigger in depth if I tell you what this activation is seeing if you map it back [00:37:16] activation is seeing if you map it back you look at the stride and the filter [00:37:18] you look at the stride and the filter size you've used you could say that this [00:37:20] size you've used you could say that this is the part that this interesting [00:37:22] is the part that this interesting this-this-this activation is same it [00:37:25] this-this-this activation is same it means the pixel that was up there had no [00:37:28] means the pixel that was up there had no influence on this activation and it [00:37:30] influence on this activation and it makes sense when you think of it you're [00:37:32] makes sense when you think of it you're the easiest way to think about it is [00:37:34] the easiest way to think about it is looking at the the top picks the top [00:37:37] looking at the the top picks the top entry on the encoding volume top left [00:37:39] entry on the encoding volume top left entry you have the input image you put a [00:37:43] entry you have the input image you put a filter here this filter gives you one [00:37:45] filter here this filter gives you one number right this number this activation [00:37:48] number right this number this activation only depends on this part of the image [00:37:51] only depends on this part of the image but then if you add a convolution after [00:37:54] but then if you add a convolution after it it will take more filters and so the [00:37:58] it it will take more filters and so the deeper you go the more part of the image [00:38:01] deeper you go the more part of the image the activation will see so if you look [00:38:04] the activation will see so if you look at an activation in layer 10 it will [00:38:06] at an activation in layer 10 it will seem much a much larger part of the [00:38:08] seem much a much larger part of the input than an activation in layer 1 that [00:38:11] input than an activation in layer 1 that make sense so that's why that's why [00:38:16] make sense so that's why that's why probably the pictures that I showed here [00:38:18] probably the pictures that I showed here these ones are very small part crops [00:38:21] these ones are very small part crops small crops of the image which means the [00:38:24] small crops of the image which means the activation I was talking about here is [00:38:26] activation I was talking about here is probably earlier in the network it sees [00:38:28] probably earlier in the network it sees a much smaller part of the input [00:38:32] the final image nice what a nice one [00:38:38] the final image nice what a nice one pixel exact is going to respond to one [00:38:39] pixel exact is going to respond to one daily part the image with the pipe yeah [00:38:43] daily part the image with the pipe yeah yeah so what you look at is which [00:38:45] yeah so what you look at is which activation was maximum you look at this [00:38:47] activation was maximum you look at this one and then you match this one back to [00:38:50] one and then you match this one back to crop it make sense ok so here's again up [00:38:57] crop it make sense ok so here's again up and same this one would correspond more [00:38:59] and same this one would correspond more in the center of the image this [00:39:03] in the center of the image this intuition make sense ok good so let's [00:39:09] intuition make sense ok good so let's talk about deconvolution now it's gonna [00:39:12] talk about deconvolution now it's gonna be the hardest part of the lecture but [00:39:13] be the hardest part of the lecture but probably helping with with more [00:39:16] probably helping with with more intuition on the convolution you [00:39:18] intuition on the convolution you remember that that was the generative [00:39:21] remember that that was the generative add virtual networks scheme and we said [00:39:25] add virtual networks scheme and we said that giving a code to the generator the [00:39:27] that giving a code to the generator the generator is able to output an image so [00:39:30] generator is able to output an image so there is something happening here that [00:39:31] there is something happening here that we didn't talk about is how can we start [00:39:33] we didn't talk about is how can we start with a 100 dimensional vector and now to [00:39:36] with a 100 dimensional vector and now to put a 64 by 64 by 3 image that seems [00:39:41] put a 64 by 64 by 3 image that seems weird we could use you might say a fully [00:39:44] weird we could use you might say a fully connected layer with a lot of neurons [00:39:46] connected layer with a lot of neurons right to up sample in practice this is [00:39:50] right to up sample in practice this is one method another one is to use a [00:39:51] one method another one is to use a deconvolution network so convolutions [00:39:54] deconvolution network so convolutions will encode the information in a smaller [00:39:57] will encode the information in a smaller volume in heightened with deeper in in [00:39:59] volume in heightened with deeper in in depth while the deconvolution will do [00:40:03] depth while the deconvolution will do the reverse it will up sample the height [00:40:07] the reverse it will up sample the height and width of an image so that would be [00:40:09] and width of an image so that would be useful in this case another case where [00:40:13] useful in this case another case where it would be usefully segmentation you [00:40:14] it would be usefully segmentation you remember our case studies for [00:40:16] remember our case studies for segmentation lifecell microscopic images [00:40:19] segmentation lifecell microscopic images of cells give it to a convolution [00:40:22] of cells give it to a convolution Network it's gonna encode it so it's [00:40:24] Network it's gonna encode it so it's gonna lower the height and width the [00:40:27] gonna lower the height and width the interesting thing about this encoding in [00:40:29] interesting thing about this encoding in the middle is that it holds a lot of [00:40:31] the middle is that it holds a lot of meaningful information but what we want [00:40:33] meaningful information but what we want ultimately is to get a segmentation mask [00:40:36] ultimately is to get a segmentation mask and the segmentation mask in height and [00:40:38] and the segmentation mask in height and width has to be the same size as the [00:40:41] width has to be the same size as the pixel image so we need [00:40:43] pixel image so we need volution Network - up sample it so the [00:40:48] volution Network - up sample it so the conversion are used in these cases today [00:40:50] conversion are used in these cases today the case we're going to talk about is [00:40:52] the case we're going to talk about is visualization remember the gradient [00:40:56] visualization remember the gradient ascent method we talked about we define [00:40:58] ascent method we talked about we define an objective function by choosing an [00:41:00] an objective function by choosing an activation in the middle of the network [00:41:02] activation in the middle of the network and we want the objective to be equal to [00:41:04] and we want the objective to be equal to this activation to find the input image [00:41:06] this activation to find the input image that maximizes its activation through an [00:41:08] that maximizes its activation through an iterative process now we don't want to [00:41:10] iterative process now we don't want to use an iterative process we want to use [00:41:13] use an iterative process we want to use reconstruction of this activation [00:41:15] reconstruction of this activation directly in the input space by one [00:41:17] directly in the input space by one backward path so let's say I select this [00:41:21] backward path so let's say I select this feature map out of the max book 255 [00:41:26] feature map out of the max book 255 sorry 5x5 by 256 what I'm going to do is [00:41:30] sorry 5x5 by 256 what I'm going to do is I'm going to identify the max activation [00:41:33] I'm going to identify the max activation of this feature map here it is is this [00:41:35] of this feature map here it is is this one third column second row I'm going to [00:41:41] one third column second row I'm going to set all the others to zero just this one [00:41:44] set all the others to zero just this one I keep it because it seems that this one [00:41:46] I keep it because it seems that this one has detected something don't want to [00:41:49] has detected something don't want to talk about the others I'm going to try [00:41:52] talk about the others I'm going to try to reconstruct in the input space what [00:41:54] to reconstruct in the input space what this activation has fired for so I'm [00:41:57] this activation has fired for so I'm going to compute the reverse [00:42:00] going to compute the reverse mathematical operation of pooling relu [00:42:02] mathematical operation of pooling relu and convolution I will pull I will [00:42:07] and convolution I will pull I will unreal you let's say doesn't like this [00:42:09] unreal you let's say doesn't like this word doesn't exist so don't use it but [00:42:11] word doesn't exist so don't use it but unreal ooh and decomp and I will do it [00:42:14] unreal ooh and decomp and I will do it several times because this activation [00:42:16] several times because this activation went through several of them so I will [00:42:18] went through several of them so I will do it again and again until I see all [00:42:22] do it again and again until I see all this specific activation that I selected [00:42:26] this specific activation that I selected in the feature map fired because it [00:42:29] in the feature map fired because it showed the ears of the duck and as you [00:42:32] showed the ears of the duck and as you see this image is cropped again it's not [00:42:34] see this image is cropped again it's not the entire image it's just the part that [00:42:36] the entire image it's just the part that the activation has seen and if you look [00:42:38] the activation has seen and if you look at where the activation is located on [00:42:39] at where the activation is located on the feature map it makes sense that this [00:42:41] the feature map it makes sense that this is the part that corresponds to it so [00:42:45] is the part that corresponds to it so now the higher-level intuition is this [00:42:48] now the higher-level intuition is this we're going to delve into it and see [00:42:50] we're going to delve into it and see what do we mean by and pull what do we [00:42:53] what do we mean by and pull what do we mean by unreal oh and what do we mean by [00:42:54] mean by unreal oh and what do we mean by decock okay [00:42:57] decock okay yes you're at the mall and whatever [00:43:01] yes you're at the mall and whatever values they were at what we have just [00:43:03] values they were at what we have just gone and I read the instruction of the [00:43:05] gone and I read the instruction of the whole image so the difference I mean if [00:43:08] whole image so the difference I mean if we don't zero out all the activations it [00:43:10] we don't zero out all the activations it says that this through construction [00:43:12] says that this through construction would be Messier [00:43:13] would be Messier it would be more messy doesn't doesn't [00:43:16] it would be more messy doesn't doesn't necessarily mean you will not get the [00:43:18] necessarily mean you will not get the full image because probably the other [00:43:21] full image because probably the other activations probably didn't even fire it [00:43:23] activations probably didn't even fire it means they didn't detected anything else [00:43:25] means they didn't detected anything else it's just that it's gonna is going to [00:43:27] it's just that it's gonna is going to add some noise to this reconstruction [00:43:30] add some noise to this reconstruction okay so let's talk about the convolution [00:43:32] okay so let's talk about the convolution a little bit on the board so to start [00:43:38] a little bit on the board so to start with the convolution and you guys can [00:43:43] with the convolution and you guys can take notes if you want we're going to [00:43:44] take notes if you want we're going to spend about 20 minutes on the board now [00:43:46] spend about 20 minutes on the board now to discuss the convolution okay to [00:43:55] to discuss the convolution okay to understand the deconvolution we first [00:43:57] understand the deconvolution we first need to understand the convolution we've [00:43:59] need to understand the convolution we've seen it from a computer science [00:44:02] seen it from a computer science perspective but actually what we're [00:44:04] perspective but actually what we're going to do here is we're going to frame [00:44:06] going to do here is we're going to frame the convolution as a simple matrix [00:44:09] the convolution as a simple matrix vector mathematical operation I'm going [00:44:12] vector mathematical operation I'm going to see that it's actually possible so [00:44:14] to see that it's actually possible so let's start with a 1d come for the 1d [00:44:26] let's start with a 1d come for the 1d convolution I will take an input X which [00:44:29] convolution I will take an input X which is of size 12 X 1 X 2 X 3 X 4 X 5 X 6 X [00:44:39] is of size 12 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 so 8 plus 2 padding which gives me [00:44:44] 7 X 8 so 8 plus 2 padding which gives me the 12 that I mentioned so the input is [00:44:48] the 12 that I mentioned so the input is a one-dimensional vector which has [00:44:52] a one-dimensional vector which has padding of 2 on both sides I will give [00:44:57] padding of 2 on both sides I will give it to a layer that will be a 1d comm and [00:45:01] it to a layer that will be a 1d comm and this layer will have only one filter and [00:45:05] this layer will have only one filter and the filter size [00:45:10] we'll be four we will also use a stride [00:45:16] we'll be four we will also use a stride equal to two so my first question is [00:45:24] equal to two so my first question is what's the size of the output can you [00:45:27] what's the size of the output can you guys compute it on your on your notepad [00:45:32] guys compute it on your on your notepad and and tell me what's the size of the [00:45:35] and and tell me what's the size of the output input size twelve filter of size [00:45:47] output input size twelve filter of size four stride of two padding of to fight [00:45:52] four stride of two padding of to fight yeah I heard yeah so remember you use an [00:45:55] yeah I heard yeah so remember you use an X sorry [00:45:57] X sorry n y equals n X minus F plus two T [00:46:03] n y equals n X minus F plus two T divided by stride and you will get five [00:46:07] divided by stride and you will get five so what I'm gonna get is y1 y2 y3 y4 y5 [00:46:21] so I'm going to focus on this specific [00:46:24] so I'm going to focus on this specific convolution for now and I'm going to [00:46:26] convolution for now and I'm going to show now that we can define it as a as a [00:46:30] show now that we can define it as a as a mathematical operation between a matrix [00:46:32] mathematical operation between a matrix and a vector so the way to do it is I [00:46:35] and a vector so the way to do it is I guess the easiest way is to write the [00:46:37] guess the easiest way is to write the system of equation that is underlying [00:46:40] system of equation that is underlying here what is y1 y1 is the filter applied [00:46:46] here what is y1 y1 is the filter applied to the four first values here does it [00:46:50] to the four first values here does it make sense so if I define my filter as [00:46:54] make sense so if I define my filter as being Y W 1 W 2 W 3 and W 4 what I'm [00:47:02] being Y W 1 W 2 W 3 and W 4 what I'm gonna get is that Y 1 equals W 1 times 0 [00:47:06] gonna get is that Y 1 equals W 1 times 0 plus W 2 times 0 plus W 3 times X 1 plus [00:47:12] plus W 2 times 0 plus W 3 times X 1 plus W 4 times X 2 this makes sense just a [00:47:18] W 4 times X 2 this makes sense just a convolution element-wise operation and [00:47:21] convolution element-wise operation and then sum all of it [00:47:25] why to is going to be same thing but we [00:47:30] why to is going to be same thing but we just ride off to going to down so he's [00:47:33] just ride off to going to down so he's going to give me W 1 times X 1 plus W 2 [00:47:37] going to give me W 1 times X 1 plus W 2 times X 2 plus W 3 times X 3 plus W 4 [00:47:43] times X 2 plus W 3 times X 3 plus W 4 times X 4 correct everybody's following [00:47:49] times X 4 correct everybody's following know same thing we will do it for all [00:47:53] know same thing we will do it for all the wise until Y 5 and we know that Y 5 [00:47:56] the wise until Y 5 and we know that Y 5 is element wise operation between the [00:47:58] is element wise operation between the filter and the 4 last number here [00:48:00] filter and the 4 last number here summing them so it will give me W 1 [00:48:05] summing them so it will give me W 1 times X 7 plus W 2 times X 8 plus 0 plus [00:48:12] times X 7 plus W 2 times X 8 plus 0 plus W 3 times 0 plus W 4 times 0 ok now what [00:48:28] W 3 times 0 plus W 4 times 0 ok now what we're going to do is to try to write [00:48:30] we're going to do is to try to write down Y as a matrix vector operation [00:48:35] down Y as a matrix vector operation between W and X we need to find what [00:48:38] between W and X we need to find what this W matrix is and looking at the [00:48:42] this W matrix is and looking at the system of equation it seems that it's [00:48:44] system of equation it seems that it's not impossible so let's try to do it I [00:48:47] not impossible so let's try to do it I will write my Y vector here y 1 y 2 y 3 [00:48:51] will write my Y vector here y 1 y 2 y 3 y 4 y 5 and I will write my matrix here [00:49:00] y 4 y 5 and I will write my matrix here and my vector X here so first question [00:49:06] and my vector X here so first question is what do you think will be the shape [00:49:08] is what do you think will be the shape of this W matrix [00:49:17] five by twelve correctly we know that [00:49:20] five by twelve correctly we know that this is five by one this is 12 by one so [00:49:24] this is five by one this is 12 by one so of course W is going to be 5 by 12 right [00:49:29] of course W is going to be 5 by 12 right so now let's try to fill it in 0 0 X 1 X [00:49:34] so now let's try to fill it in 0 0 X 1 X 2 X 3 blah blah blah x800 can you guys [00:49:42] 2 X 3 blah blah blah x800 can you guys see in the background oh yeah ok cool so [00:49:46] see in the background oh yeah ok cool so I'm going to fill in this matrix [00:49:47] I'm going to fill in this matrix regarding this system of equation I know [00:49:50] regarding this system of equation I know that the y1 would be W 1 times 0 W 2 [00:49:54] that the y1 would be W 1 times 0 W 2 times 0 W 3 time X 1 w 4 times X so this [00:49:57] times 0 W 3 time X 1 w 4 times X so this vector is going to multiply the first [00:50:01] vector is going to multiply the first row here so I just have to place my w's [00:50:04] row here so I just have to place my w's here W 1 will come here multiply 0 w 2 [00:50:08] here W 1 will come here multiply 0 w 2 will come here W 3 would come here and W [00:50:11] will come here W 3 would come here and W 4 would come here and all the rest would [00:50:13] 4 would come here and all the rest would be filled in with zeros right I don't [00:50:15] be filled in with zeros right I don't want any more multiplications how about [00:50:19] want any more multiplications how about the second row of this matrix I know [00:50:21] the second row of this matrix I know that Y 2 has to be equal to this dot [00:50:24] that Y 2 has to be equal to this dot product with this rope and I know that [00:50:26] product with this rope and I know that it's going to give me W 1 X 1 plus W 2 X [00:50:29] it's going to give me W 1 X 1 plus W 2 X 2 plus W 3 X 3 X 1 is the third input on [00:50:33] 2 plus W 3 X 3 X 1 is the third input on this vector third third entry so I would [00:50:37] this vector third third entry so I would need to shift what I had in the previous [00:50:39] need to shift what I had in the previous row with the stride of 2 it will give me [00:50:42] row with the stride of 2 it will give me that does it make sense so if I use the [00:50:50] that does it make sense so if I use the dot product of this row with that I [00:50:52] dot product of this row with that I should get the second equation up there [00:50:55] should get the second equation up there and so on and you understand what [00:50:57] and so on and you understand what happens right this pattern we just shift [00:51:00] happens right this pattern we just shift we just ride off to on the side so I [00:51:03] we just ride off to on the side so I would get zeros here and I would get my [00:51:06] would get zeros here and I would get my W 1 W 2 W 3 w 4 and then zeros and all [00:51:11] W 1 W 2 W 3 w 4 and then zeros and all the way down here and all the way down [00:51:14] the way down here and all the way down here what I we get is w 4 W 3 w 2 W 1 [00:51:19] here what I we get is w 4 W 3 w 2 W 1 and zeros so the only thing I want to [00:51:25] and zeros so the only thing I want to mention here is that the convolution [00:51:27] mention here is that the convolution operation as you see can be framed [00:51:29] operation as you see can be framed as a simple matrix times a vector yes on [00:51:35] as a simple matrix times a vector yes on the right side the top row should it be [00:51:37] the right side the top row should it be the left because that's going to fight [00:51:39] the left because that's going to fight for the top row wide the zeros are on [00:51:42] for the top row wide the zeros are on the right side yes because I don't want [00:51:45] the right side yes because I don't want Y hat y1 to be dependent on X 3/2 X 8 so [00:51:52] Y hat y1 to be dependent on X 3/2 X 8 so I want this to be 0 multiplicate pliers [00:52:00] okay so why is this important for the [00:52:05] okay so why is this important for the intuition behind the deconvolution in [00:52:06] intuition behind the deconvolution in the existence of the deconvolution is [00:52:08] the existence of the deconvolution is because if we managed to write down y [00:52:11] because if we managed to write down y equal WX we probably can write down X [00:52:16] equal WX we probably can write down X equal W minus 1 Y if W is an invertible [00:52:23] equal W minus 1 Y if W is an invertible matrix and this is going to to be our [00:52:29] matrix and this is going to to be our deconvolution and in fact what's the [00:52:31] deconvolution and in fact what's the what's the shape of this new matrix [00:52:45] yes 12 by 5 we have 12 by one on one [00:52:51] yes 12 by 5 we have 12 by one on one side five by one on the other it has to [00:52:53] side five by one on the other it has to be 12 by 5 so it's flipped compared to W [00:52:58] so one thing we're going to do here is [00:53:01] so one thing we're going to do here is we're going to make an assumption first [00:53:04] we're going to make an assumption first assumption is that W is an invertible [00:53:07] assumption is that W is an invertible matrix and on top of that we're going to [00:53:09] matrix and on top of that we're going to make a stronger assumption which is that [00:53:14] W is an orthogonal matrix and without [00:53:29] W is an orthogonal matrix and without going into the details here same as when [00:53:32] going into the details here same as when we proved Xavier initialization in [00:53:35] we proved Xavier initialization in sections we made some assumptions that [00:53:37] sections we made some assumptions that are not always true this assumption is [00:53:39] are not always true this assumption is not going to be always true one one [00:53:43] not going to be always true one one intuition that you can have is if I'm [00:53:44] intuition that you can have is if I'm using a filter that is assume the filter [00:53:49] using a filter that is assume the filter is an edge detector so like plus 1 0 0 [00:53:54] is an edge detector so like plus 1 0 0 minus 1 in this case the matrix would be [00:54:00] minus 1 in this case the matrix would be orthogonal why a matrix that is [00:54:03] orthogonal why a matrix that is orthogonal means that if I take two of [00:54:06] orthogonal means that if I take two of the columns here I dot product them [00:54:09] the columns here I dot product them together it should give me 0 same with [00:54:13] together it should give me 0 same with the rows you can see it so what's [00:54:14] the rows you can see it so what's interesting is that if the stride was [00:54:19] interesting is that if the stride was for there will be no overlap between [00:54:22] for there will be no overlap between these two rows it would give me an [00:54:25] these two rows it would give me an orthogonal matrix here let's try these [00:54:27] orthogonal matrix here let's try these two but if I replace this W 1 by minus 1 [00:54:30] two but if I replace this W 1 by minus 1 0 0 plus 1 sorry plus 1 0 0 minus 1 and [00:54:34] 0 0 plus 1 sorry plus 1 0 0 minus 1 and minus plus 1 0 0 minus 1 you can see [00:54:36] minus plus 1 0 0 minus 1 you can see that the dot product would be 0 the [00:54:39] that the dot product would be 0 the zeros will multiply the ones and the [00:54:42] zeros will multiply the ones and the ones were multiplied the zeros give me a [00:54:44] ones were multiplied the zeros give me a 0 dot prod so this is a case where it [00:54:46] 0 dot prod so this is a case where it works practices doesn't always work the [00:54:49] works practices doesn't always work the reason we're making this assumption is [00:54:51] reason we're making this assumption is because we want to make a reconstruction [00:54:53] because we want to make a reconstruction right so we want to be able to [00:54:56] right so we want to be able to this w- one this this is invert and the [00:55:01] this w- one this this is invert and the reconstruction is not going to be exact [00:55:03] reconstruction is not going to be exact but at a first-order approximation we [00:55:07] but at a first-order approximation we can assume that the reconstruction will [00:55:09] can assume that the reconstruction will still be useful to us even if this [00:55:11] still be useful to us even if this assumption is not always true in the [00:55:13] assumption is not always true in the case where w is orthogonal i know that [00:55:17] case where w is orthogonal i know that the inverter of w is w transpose or [00:55:20] the inverter of w is w transpose or another way to write it is that for [00:55:23] another way to write it is that for orthogonal matrices w transpose time w [00:55:26] orthogonal matrices w transpose time w is the identity matrix so what it tells [00:55:30] is the identity matrix so what it tells me is that X is going to be W transpose [00:55:35] me is that X is going to be W transpose time y times y so let's see what we get [00:55:40] time y times y so let's see what we get from that let me write down the main C [00:55:52] from that let me write down the main C code so let's say now we have our X and [00:56:07] code so let's say now we have our X and we want to regenerate our we will have [00:56:11] we want to regenerate our we will have our Y and we want to generate our X [00:56:13] our Y and we want to generate our X using this method so I would what I [00:56:17] using this method so I would what I would write is to understand the 1dd [00:56:21] would write is to understand the 1dd comp we can use the following [00:56:24] comp we can use the following illustrations where we have X here which [00:56:29] illustrations where we have X here which is 0 0 X 1 X 2 X 3 all the way down to X [00:56:34] is 0 0 X 1 X 2 X 3 all the way down to X 8 okay and I will have my W matrix here [00:56:44] 8 okay and I will have my W matrix here W transpose and my Y vector y1 y2 y3 y4 [00:56:51] W transpose and my Y vector y1 y2 y3 y4 and y5 here and so I know that this [00:56:56] and y5 here and so I know that this matrix will be the transpose of the one [00:56:58] matrix will be the transpose of the one I have here right so I can just write [00:57:01] I have here right so I can just write down the transpose the transpose will be [00:57:03] down the transpose the transpose will be w1 w2 w3 w-4 okay I will shifted [00:57:10] w1 w2 w3 w-4 okay I will shifted we destroyed of two and so on and this [00:57:32] we destroyed of two and so on and this whole thing will be w transpose so the [00:57:41] whole thing will be w transpose so the small issue here is that this in [00:57:44] small issue here is that this in practice is not is going to be very [00:57:47] practice is not is going to be very similar to a convolution but because but [00:57:52] similar to a convolution but because but it's going to be a tiny little different [00:57:54] it's going to be a tiny little different in terms of implementation another [00:57:56] in terms of implementation another question I might ask is how can we do [00:57:59] question I might ask is how can we do the same thing with the same pattern as [00:58:01] the same thing with the same pattern as we have here it means the stride is [00:58:04] we have here it means the stride is going from left to right instead of [00:58:06] going from left to right instead of going from up to down I'm going to [00:58:11] going from up to down I'm going to introduce that with the technique called [00:58:13] introduce that with the technique called sub-pixel convolution and for those of [00:58:21] sub-pixel convolution and for those of you who read papers in segmentation in [00:58:23] you who read papers in segmentation in visualization often time this is the [00:58:25] visualization often time this is the type of convolution that is used for [00:58:26] type of convolution that is used for reconstruction so let's see how it works [00:58:28] reconstruction so let's see how it works I just want to do the same operation but [00:58:34] I just want to do the same operation but instead of doing it we just try going [00:58:36] instead of doing it we just try going from up to down I want to do it from a [00:58:39] from up to down I want to do it from a strike going from left to right what one [00:58:47] strike going from left to right what one thing you want to you want to notice [00:58:49] thing you want to you want to notice here is that the two lines that I wrote [00:58:54] here is that the two lines that I wrote here are cropped and the reason is [00:59:00] here are cropped and the reason is because we're using a padded input here [00:59:03] because we're using a padded input here we will just crop the two top lines and [00:59:06] we will just crop the two top lines and same for the two last lines they will be [00:59:10] same for the two last lines they will be cropped look at that w1 we multiply y [00:59:17] cropped look at that w1 we multiply y one and this one we multiply Y two and [00:59:21] one and this one we multiply Y two and so on so this dot product will give me [00:59:23] so on so this dot product will give me w1 times one [00:59:24] w1 times one but I don't want that to happen because [00:59:26] but I don't want that to happen because I want to get to padded zero here [00:59:28] I want to get to padded zero here so we just drop that in this matrix is [00:59:32] so we just drop that in this matrix is actually going to be smaller than it [00:59:34] actually going to be smaller than it seems and is going to generate my X 1 [00:59:36] seems and is going to generate my X 1 through X Y 8 and then I will pad the [00:59:39] through X Y 8 and then I will pad the top values in the bottom values okay [00:59:42] top values in the bottom values okay just the hack [00:59:46] so let's look at the subpixel [00:59:48] so let's look at the subpixel convolution I have my input and now we [00:59:57] convolution I have my input and now we do something quite fun I would perform a [01:00:04] do something quite fun I would perform a sub-pixel operation on Y what does it [01:00:07] sub-pixel operation on Y what does it mean I will insert zeros almost [01:00:10] mean I will insert zeros almost everywhere I would insert them and I [01:00:12] everywhere I would insert them and I will get 0 0 Y 1 0 Y 2 0 Y 3 0 Y 4 0 Y 5 [01:00:21] will get 0 0 Y 1 0 Y 2 0 Y 3 0 Y 4 0 Y 5 and 0 0 even more zero here so this [01:00:27] and 0 0 even more zero here so this vector is just the vector Y with some [01:00:32] vector is just the vector Y with some zeros inserted around it and also in the [01:00:35] zeros inserted around it and also in the middle between the elements of Y now why [01:00:39] middle between the elements of Y now why is that interesting it trans resting [01:00:41] is that interesting it trans resting because I can now write down my [01:00:44] because I can now write down my convolution by flipping my weight so let [01:01:08] convolution by flipping my weight so let me explain a little bit what happened [01:01:09] me explain a little bit what happened here what we wanted is in order to be [01:01:16] here what we wanted is in order to be able to efficiently compute the [01:01:17] able to efficiently compute the deconvolution the same way as we've [01:01:19] deconvolution the same way as we've learnt to compute the convolution we [01:01:23] learnt to compute the convolution we wanted to have the weights scattered [01:01:26] wanted to have the weights scattered from left to right with the stride [01:01:28] from left to right with the stride moving from left to right what we did is [01:01:31] moving from left to right what we did is that we use a sub pixel version of Y by [01:01:34] that we use a sub pixel version of Y by inserting zeros in the middle and we [01:01:36] inserting zeros in the middle and we divided the stride by 2 [01:01:38] divided the stride by 2 so instead of having a stride of two as [01:01:40] so instead of having a stride of two as we had in our convolution we have a [01:01:42] we had in our convolution we have a stride of one in our deconvolution so [01:01:45] stride of one in our deconvolution so notice that I shift my weights from one [01:01:48] notice that I shift my weights from one at every step when I move from one row [01:01:51] at every step when I move from one row to another second thing is I flipped my [01:01:54] to another second thing is I flipped my weights I flipped my weight so instead [01:01:58] weights I flipped my weight so instead of having W 1 W 2 W 3 w 4 now I have W [01:02:01] of having W 1 W 2 W 3 w 4 now I have W for W 3 w 2 W 1 and what you could see [01:02:05] for W 3 w 2 W 1 and what you could see is looking at that first look at this [01:02:11] is looking at that first look at this row the first row that is not cropped [01:02:13] row the first row that is not cropped the result of the dot product of this [01:02:17] the result of the dot product of this row with this vector is going to be Y 1 [01:02:21] row with this vector is going to be Y 1 times W 3 plus y 2 times W 1 yeah now [01:02:31] times W 3 plus y 2 times W 1 yeah now let's look what happened here I look at [01:02:34] let's look what happened here I look at my first row here the dot product of [01:02:37] my first row here the dot product of this first room with my Y here is going [01:02:42] this first room with my Y here is going to be a sorry sorry we these two are [01:02:45] to be a sorry sorry we these two are cropped is what in same here so looking [01:02:53] cropped is what in same here so looking at my first non cropped row here as a [01:02:57] at my first non cropped row here as a dot product with this vector what I get [01:03:01] dot product with this vector what I get is w 3 times y 1 plus W 2 sorry plus W 1 [01:03:09] is w 3 times y 1 plus W 2 sorry plus W 1 times y 2 so exactly the same thing as I [01:03:12] times y 2 so exactly the same thing as I got there so these two operations are [01:03:15] got there so these two operations are exactly the same operations they're the [01:03:17] exactly the same operations they're the same thing you get the same results to a [01:03:20] same thing you get the same results to a different way of doing it one is using a [01:03:23] different way of doing it one is using a weird operation with strides going from [01:03:26] weird operation with strides going from top to bottom and the second one is [01:03:28] top to bottom and the second one is exactly a convolution [01:03:29] exactly a convolution these are convolution convolution plus [01:03:32] these are convolution convolution plus flipped weights insertion of zeros for [01:03:35] flipped weights insertion of zeros for the subpixel version of Y [01:03:41] and on top of that padding here and [01:03:44] and on top of that padding here and there so this was the hardest part okay [01:03:50] there so this was the hardest part okay does it give more intuition on the [01:03:53] does it give more intuition on the convolution here you know now how [01:03:56] convolution here you know now how convolution can be framed as a [01:03:57] convolution can be framed as a mathematical operation between a matrix [01:03:59] mathematical operation between a matrix and a vector and you know also that [01:04:01] and a vector and you know also that under these assumptions the way we will [01:04:04] under these assumptions the way we will deconvolve is just by flipping our [01:04:07] deconvolve is just by flipping our weights dividing the stride by two and [01:04:11] weights dividing the stride by two and inserting zeros if we just do that we're [01:04:14] inserting zeros if we just do that we're deconvolve Inc four propagates in a [01:04:18] deconvolve Inc four propagates in a convolution the following way [01:04:20] convolution the following way you want to deconvolve just flip all the [01:04:23] you want to deconvolve just flip all the weights insert zeros sub-pixel and [01:04:27] weights insert zeros sub-pixel and finally divide the stride and that's the [01:04:30] finally divide the stride and that's the deconvolution it's a super complex thing [01:04:33] deconvolution it's a super complex thing to understand but this is the intuition [01:04:35] to understand but this is the intuition behind it now let's try to have an [01:04:38] behind it now let's try to have an intuition of how it would work in two [01:04:40] intuition of how it would work in two dimension let me write it down why do we [01:04:54] dimension let me write it down why do we use that because in terms of [01:04:56] use that because in terms of implementation this is the same as what [01:04:58] implementation this is the same as what we've been using here is very similar [01:05:01] we've been using here is very similar while this one is another implementation [01:05:04] while this one is another implementation so you could do both the same is the [01:05:06] so you could do both the same is the same operation but in practice this one [01:05:09] same operation but in practice this one is easier to understand because it's [01:05:11] is easier to understand because it's exactly the same operation of the [01:05:12] exactly the same operation of the convolution with flipped weights [01:05:14] convolution with flipped weights insertion of zeros and divide it right [01:05:17] insertion of zeros and divide it right that's why I wanted to show that when [01:05:24] that's why I wanted to show that when yes assumption doesn't hold [01:05:25] yes assumption doesn't hold yeah so oftentimes the assumption [01:05:27] yeah so oftentimes the assumption doesn't hold but what we want is to be [01:05:29] doesn't hold but what we want is to be able to sear a construction and if we [01:05:31] able to sear a construction and if we use this method we will still see our [01:05:32] use this method we will still see our construction practice if we had really W [01:05:37] construction practice if we had really W minus one the reconstruction would be [01:05:39] minus one the reconstruction would be much better but we don't so let me go [01:05:43] much better but we don't so let me go over to to the the 2d example we're [01:05:46] over to to the the 2d example we're going to go a little over time because [01:05:47] going to go a little over time because we have two hours technically for one [01:05:49] we have two hours technically for one hour and 50 minutes and and let me go [01:05:52] hour and 50 minutes and and let me go over the 2d X [01:05:54] over the 2d X and then we will answer this question on [01:05:57] and then we will answer this question on why we need to make this assumption so [01:06:06] why we need to make this assumption so here is the interpretation of the 2d [01:06:08] here is the interpretation of the 2d deconvolution let me write it down here [01:06:17] the intuition behind the 2d become is I [01:06:21] the intuition behind the 2d become is I get my input which is 5 by 5 and this I [01:06:27] get my input which is 5 by 5 and this I call it x i4 propagate it's using a [01:06:31] call it x i4 propagate it's using a filter of size 2 by 2 in a conflate and [01:06:35] filter of size 2 by 2 in a conflate and astride of - this is my convolution what [01:06:40] astride of - this is my convolution what I get so if you do 5 minus 2 plus the [01:06:46] I get so if you do 5 minus 2 plus the padding which is 0 divided by 2 plus 1 [01:06:50] padding which is 0 divided by 2 plus 1 oh I forgot the plus 1 here plus 1 and [01:06:53] oh I forgot the plus 1 here plus 1 and you floor it so so 5 minus 2 divided by [01:06:59] you floor it so so 5 minus 2 divided by 2 gives you 3 divided by 2 plus 1 [01:07:04] 2 gives you 3 divided by 2 plus 1 no actually it will give you 3 by 3 yeah [01:07:08] no actually it will give you 3 by 3 yeah 3 by 3 a Y of 3 by 3 that's what you get [01:07:11] 3 by 3 a Y of 3 by 3 that's what you get and now this you call it Y what you're [01:07:18] and now this you call it Y what you're going to do here is you're going to [01:07:19] going to do here is you're going to deconvolve Y in order to deconvolve Y in [01:07:25] deconvolve Y in order to deconvolve Y in order to deconvolve it you're going to [01:07:27] order to deconvolve it you're going to use a stride of 1 and what we said is [01:07:31] use a stride of 1 and what we said is that we need to divide the stride by 2 [01:07:33] that we need to divide the stride by 2 right so we need astride of 1 and the [01:07:37] right so we need astride of 1 and the filter will be the same two by two and [01:07:40] filter will be the same two by two and you remember that what we've seen is [01:07:42] you remember that what we've seen is that the feature is the same it's just [01:07:44] that the feature is the same it's just that is going to be flipped so you will [01:07:47] that is going to be flipped so you will use a filter of Dubai to but flip [01:07:54] and now what do we get we hope to get a [01:07:58] and now what do we get we hope to get a five by five input which is going to be [01:08:01] five by five input which is going to be our reconstructed X five by five input [01:08:04] our reconstructed X five by five input and the way we're going to do it is this [01:08:07] and the way we're going to do it is this is the intuition behind it yeah okay up [01:08:17] is the intuition behind it yeah okay up to my - thanks yeah five by five here [01:08:23] to my - thanks yeah five by five here that's what we hope to reconstruct the [01:08:26] that's what we hope to reconstruct the way we will do it is we will take the [01:08:27] way we will do it is we will take the filter s is two by two we will put it [01:08:31] filter s is two by two we will put it here and we will multiply all the [01:08:37] here and we will multiply all the weights of this filter by y1 1 all the [01:08:42] weights of this filter by y1 1 all the weights will be multiplied by y 1 1 so [01:08:46] weights will be multiplied by y 1 1 so we get four values here which are going [01:08:48] we get four values here which are going to be W 4 y 1 1 W 3 y 1 1 and so on now [01:08:54] to be W 4 y 1 1 W 3 y 1 1 and so on now I will shift this with the stride of 1 [01:08:57] I will shift this with the stride of 1 and I will put my filter again here and [01:09:00] and I will put my filter again here and I will multiply all the entries by Y 1 2 [01:09:04] I will multiply all the entries by Y 1 2 and so on and you see that this entry [01:09:10] and so on and you see that this entry has an overlap so it will it will it [01:09:12] has an overlap so it will it will it will be updated at every step of the [01:09:14] will be updated at every step of the convolution it's not like what happened [01:09:16] convolution it's not like what happened in the fourth pass so this is the [01:09:19] in the fourth pass so this is the intuition behind the two deconvolution [01:09:21] intuition behind the two deconvolution 3d same thing you have a volume here so [01:09:27] 3d same thing you have a volume here so your filter is going to be a volume what [01:09:30] your filter is going to be a volume what you're going to do is you're going to [01:09:32] you're going to do is you're going to put the volume here x 1 1 1 and so on [01:09:37] put the volume here x 1 1 1 and so on and then if you have a second filter you [01:09:39] and then if you have a second filter you would put it again on top of it and [01:09:40] would put it again on top of it and multiply by 1 1 1 all the weights of the [01:09:43] multiply by 1 1 1 all the weights of the filter and so on it's a little [01:09:45] filter and so on it's a little complicated but this is the intuition [01:09:47] complicated but this is the intuition behind the convolution ok let's get back [01:09:50] behind the convolution ok let's get back to the lecture I'm going to take one [01:09:52] to the lecture I'm going to take one question here if you guys need [01:09:53] question here if you guys need clarification [01:09:59] no worries you don't understand the [01:10:01] no worries you don't understand the convolution fully is the important part [01:10:03] convolution fully is the important part is that you get the intuition here and [01:10:04] is that you get the intuition here and you understand how we use it so let me [01:10:07] you understand how we use it so let me make a comment why do we need to make [01:10:10] make a comment why do we need to make this assumption and do we need to make [01:10:12] this assumption and do we need to make when we want to reconstruct like we're [01:10:15] when we want to reconstruct like we're doing here in the visualization we need [01:10:17] doing here in the visualization we need to make this assumption because we don't [01:10:20] to make this assumption because we don't want to retrain waits for the D [01:10:21] want to retrain waits for the D convolutional Network what we know is [01:10:24] convolutional Network what we know is that the activation we selected here on [01:10:26] that the activation we selected here on the feature map is has gone through the [01:10:30] the feature map is has gone through the entire pipeline of the confidence so to [01:10:32] entire pipeline of the confidence so to reconstruct we need to use the weights [01:10:34] reconstruct we need to use the weights that we already have in the confidence [01:10:36] that we already have in the confidence we need to pass them to the [01:10:37] we need to pass them to the deconvolution and reconstruct if we're [01:10:40] deconvolution and reconstruct if we're doing the segmentation like we talked [01:10:42] doing the segmentation like we talked about for the lifecell we don't need to [01:10:46] about for the lifecell we don't need to do this assumption we're just saying [01:10:47] do this assumption we're just saying that this is a procedure that is the D [01:10:50] that this is a procedure that is the D convolution and we will train the [01:10:52] convolution and we will train the weights of the deconvolution so there is [01:10:55] weights of the deconvolution so there is no need to make this assumption it's [01:10:56] no need to make this assumption it's just we have a technique that is [01:10:57] just we have a technique that is dividing this right by one and inserting [01:11:00] dividing this right by one and inserting zeroes and then beam we will train the [01:11:02] zeroes and then beam we will train the weights and we get an output that is an [01:11:05] weights and we get an output that is an AB sampled version of the input that was [01:11:07] AB sampled version of the input that was given to it so there's two use case one [01:11:10] given to it so there's two use case one where you use the weights and one where [01:11:12] where you use the weights and one where you don't in this case we don't want to [01:11:13] you don't in this case we don't want to retrain we want to use the weights so [01:11:16] retrain we want to use the weights so let's see let's see a version more [01:11:18] let's see let's see a version more visual of the up sampling so we do the [01:11:22] visual of the up sampling so we do the subpixel image this is my image 4x4 i [01:11:25] subpixel image this is my image 4x4 i insert zeros and I pad it I get a nine [01:11:27] insert zeros and I pad it I get a nine by nine image I have my filter like that [01:11:31] by nine image I have my filter like that and this filter will convolve I will it [01:11:35] and this filter will convolve I will it would convolve over the input so I would [01:11:36] would convolve over the input so I would place it on my input and at every step I [01:11:39] place it on my input and at every step I would perform a convolution up I will [01:11:41] would perform a convolution up I will get a value here the value is blue [01:11:43] get a value here the value is blue because as you can see the weights that [01:11:44] because as you can see the weights that affected the output were only the blue [01:11:47] affected the output were only the blue weights I would use a stride of one beam [01:11:51] weights I would use a stride of one beam now the weights that affect my input are [01:11:53] now the weights that affect my input are the green ones and so on and I would [01:11:56] the green ones and so on and I would just come valve as I do usually and so [01:12:02] just come valve as I do usually and so on and now one step down I see that the [01:12:05] on and now one step down I see that the weights that are impacting my input are [01:12:07] weights that are impacting my input are the purple ones so I would put a purple [01:12:10] the purple ones so I would put a purple here and so on so I just do the [01:12:12] here and so on so I just do the convolution like that and so so one [01:12:17] convolution like that and so so one thing that is interesting here is that [01:12:19] thing that is interesting here is that the values that are blue in my out 6x6 [01:12:22] the values that are blue in my out 6x6 output were generated only using the [01:12:26] output were generated only using the blue values of the filter the blue [01:12:28] blue values of the filter the blue weights in the filter the ones that are [01:12:32] weights in the filter the ones that are green were only used you were only [01:12:34] green were only used you were only generated using the green values of my [01:12:36] generated using the green values of my filter so actually this subsample [01:12:38] filter so actually this subsample sub-pixel [01:12:39] sub-pixel convolution or deconvolution could have [01:12:42] convolution or deconvolution could have been done with for convolutions with the [01:12:46] been done with for convolutions with the blue weights green weights purple white [01:12:49] blue weights green weights purple white sand yellow weights and then just just [01:12:52] sand yellow weights and then just just replaced such that the adjustment would [01:12:56] replaced such that the adjustment would be the output [01:12:57] be the output just put the output of each of these [01:13:00] just put the output of each of these comp and mix them to give out a 6x6 [01:13:03] comp and mix them to give out a 6x6 output only thing you need to know we [01:13:05] output only thing you need to know we have an input 4x4 and we get an output [01:13:07] have an input 4x4 and we get an output 6x6 that's what we wanted we wanted to [01:13:09] 6x6 that's what we wanted we wanted to of sample the image we can retrain the [01:13:11] of sample the image we can retrain the weights or use the transport version of [01:13:13] weights or use the transport version of them so let's see what happens now we [01:13:15] them so let's see what happens now we understood what what the curve was doing [01:13:18] understood what what the curve was doing so we're able to decomp what we need to [01:13:20] so we're able to decomp what we need to do is also to ampoule and to unreal ooh [01:13:24] do is also to ampoule and to unreal ooh fortunately it's easier than the decomp [01:13:26] fortunately it's easier than the decomp so we're not going to do board work [01:13:27] so we're not going to do board work anymore so let's see how uncool works if [01:13:31] anymore so let's see how uncool works if I give you this input to the pool link [01:13:34] I give you this input to the pool link to a max pooling layer the output is [01:13:37] to a max pooling layer the output is obviously going to be this one [01:13:38] obviously going to be this one 42 is the maximum of these four numbers [01:13:41] 42 is the maximum of these four numbers assuming we're using a two-by-two filter [01:13:43] assuming we're using a two-by-two filter with right of two vertically and [01:13:45] with right of two vertically and horizontally 12 is the maximum of the [01:13:48] horizontally 12 is the maximum of the green numbers six is the maximum of the [01:13:50] green numbers six is the maximum of the red numbers and seven the orange ones [01:13:52] red numbers and seven the orange ones now question I give you back the output [01:13:56] now question I give you back the output and I tell you give me the input can you [01:14:01] and I tell you give me the input can you give me the input or no no what why [01:14:07] give me the input or no no what why you need you need you only keep the [01:14:09] you need you need you only keep the maximum so you you lost all the other [01:14:12] maximum so you you lost all the other numbers I don't know anymore the 0 1 and [01:14:15] numbers I don't know anymore the 0 1 and minus 1 that's where the red numbers [01:14:16] minus 1 that's where the red numbers here because they didn't pass through [01:14:19] here because they didn't pass through the maximum so max pool is not [01:14:22] the maximum so max pool is not invertible from mathematical perspective [01:14:25] invertible from mathematical perspective what we can do is approximate its invert [01:14:28] what we can do is approximate its invert how can we do that spread it out that's [01:14:34] how can we do that spread it out that's a good point we could spread out the the [01:14:37] a good point we could spread out the the 6 among the 4 values that would be an [01:14:40] 6 among the 4 values that would be an approximation a better way if we managed [01:14:44] approximation a better way if we managed to catch some values is to catch [01:14:45] to catch some values is to catch something we call the switches we catch [01:14:48] something we call the switches we catch the values of the maximum using a matrix [01:14:51] the values of the maximum using a matrix that is very easy to store of zeros and [01:14:54] that is very easy to store of zeros and ones and we pass it to the unpooled and [01:14:57] ones and we pass it to the unpooled and now we can approximate the inverse [01:14:59] now we can approximate the inverse because we know where 6 was we know [01:15:02] because we know where 6 was we know where 12 was we know where 40 2007 was [01:15:05] where 12 was we know where 40 2007 was but it's still not invertible because we [01:15:08] but it's still not invertible because we lost all the other numbers think about [01:15:12] lost all the other numbers think about max pool back propagation it's exactly [01:15:14] max pool back propagation it's exactly the same thing these numbers 0 1 minus 1 [01:15:18] the same thing these numbers 0 1 minus 1 they had no impact in the loss function [01:15:19] they had no impact in the loss function at the end because they didn't pass [01:15:22] at the end because they didn't pass through the for propagation so actually [01:15:24] through the for propagation so actually with the switches you can have the exact [01:15:26] with the switches you can have the exact back propagation well you know that the [01:15:28] back propagation well you know that the other values are going to be zeros [01:15:29] other values are going to be zeros because they didn't affected the loss [01:15:31] because they didn't affected the loss during the forward propagation but that [01:15:34] during the forward propagation but that make sense okay so this is max pooling [01:15:37] make sense okay so this is max pooling and pooling and max pooling and we can [01:15:40] and pooling and max pooling and we can use it with the switches you can [01:15:42] use it with the switches you can approximately yeah why don't we just [01:15:47] approximately yeah why don't we just catch the whole origination quickly [01:15:49] catch the whole origination quickly could catch the entire thing but in [01:15:51] could catch the entire thing but in terms of back for back propagation in [01:15:53] terms of back for back propagation in terms of efficiency we would just use [01:15:55] terms of efficiency we would just use the switches because it's enough for on [01:15:58] the switches because it's enough for on pulling you're right we could catch [01:15:59] pulling you're right we could catch everything but then it's cheating like [01:16:01] everything but then it's cheating like you you kept it so you just give it back [01:16:03] you you kept it so you just give it back yep ok so now we know how I'm pulling [01:16:07] yep ok so now we know how I'm pulling works let's look at the relevant so what [01:16:11] works let's look at the relevant so what we need to do in fact is to pass the [01:16:13] we need to do in fact is to pass the switches and the filters back to the end [01:16:15] switches and the filters back to the end to Lindy count in order to reconstruct [01:16:16] to Lindy count in order to reconstruct switches are the matrix of zeros and [01:16:18] switches are the matrix of zeros and ones indicating where the maximum [01:16:20] ones indicating where the maximum were and filters are the filters that I [01:16:23] were and filters are the filters that I will transpose under this assumption on [01:16:26] will transpose under this assumption on the board okay and so on and so on and I [01:16:30] the board okay and so on and so on and I get my reconstruction I just need to [01:16:32] get my reconstruction I just need to explain the rail you now I give you this [01:16:36] explain the rail you now I give you this input to relu and I forward propagate it [01:16:38] input to relu and I forward propagate it what do we get all the negative numbers [01:16:41] what do we get all the negative numbers are going to be equalized to 0 and the [01:16:44] are going to be equalized to 0 and the others are going to be kept now let's [01:16:47] others are going to be kept now let's say I'm doing a back propagation through [01:16:49] say I'm doing a back propagation through Lu what do I get if I give you that this [01:16:52] Lu what do I get if I give you that this is the gradients that are coming back [01:16:53] is the gradients that are coming back and I'm asking you what are the [01:16:56] and I'm asking you what are the gradients after the rally during the [01:16:58] gradients after the rally during the back propagation how does the Rayleigh [01:17:01] back propagation how does the Rayleigh behave in backdrop [01:17:07] Zero's which ones are zeros the [01:17:13] Zero's which ones are zeros the negatives are zeros do you agree the [01:17:17] negatives are zeros do you agree the negatives in this yellow matrix are [01:17:19] negatives in this yellow matrix are going to be zeros during the backdrop I [01:17:22] going to be zeros during the backdrop I guess sure think always about what was [01:17:30] guess sure think always about what was the influence of the input on the last [01:17:32] the influence of the input on the last function and you will find out what was [01:17:35] function and you will find out what was the backpropagation look at this number [01:17:40] the backpropagation look at this number this number here - - did this number [01:17:44] this number here - - did this number have the fact that it was - - did it [01:17:46] have the fact that it was - - did it have any influence on the last function [01:17:48] have any influence on the last function no it could have been -10 it could have [01:17:51] no it could have been -10 it could have been -20 it's not going to impact the [01:17:53] been -20 it's not going to impact the last function so what do you think [01:17:55] last function so what do you think should be the number here zero even if [01:17:59] should be the number here zero even if the number that is coming back the [01:18:01] the number that is coming back the gradient is 10 so what do you think [01:18:05] gradient is 10 so what do you think should be the value backward output [01:18:16] same idea is Mac spending what we need [01:18:21] same idea is Mac spending what we need to do is to remember the switches [01:18:23] to do is to remember the switches remember which of these values had an [01:18:25] remember which of these values had an impact on the loss we passed the [01:18:28] impact on the loss we passed the switches all these values here that are [01:18:31] switches all these values here that are kind of a y-you know this is a why all [01:18:34] kind of a y-you know this is a why all these ones had no impact on the last [01:18:36] these ones had no impact on the last function so when you back from a gate [01:18:38] function so when you back from a gate their gradient should be set to zero [01:18:40] their gradient should be set to zero it doesn't matter to update them it's [01:18:41] it doesn't matter to update them it's not gonna make the loss go down so these [01:18:44] not gonna make the loss go down so these are all zeros and the rest they just [01:18:46] are all zeros and the rest they just pass why do they pass with the same [01:18:49] pass why do they pass with the same value because relu for positive numbers [01:18:51] value because relu for positive numbers was one so this number one here that [01:18:54] was one so this number one here that passed the rally during the for [01:18:55] passed the rally during the for propagation it was not modified its [01:18:57] propagation it was not modified its gradient is going to be one that makes [01:19:01] gradient is going to be one that makes sense so this is really backward now in [01:19:04] sense so this is really backward now in this reconstruction method we're not [01:19:06] this reconstruction method we're not going to use rayleigh back part we're [01:19:08] going to use rayleigh back part we're going to use something we call value D [01:19:10] going to use something we call value D confident let's say the reason we're not [01:19:12] confident let's say the reason we're not the intuition between why we're not [01:19:14] the intuition between why we're not using value backward is because what [01:19:16] using value backward is because what we're interested in is to know which [01:19:18] we're interested in is to know which pixels of the input positively affected [01:19:21] pixels of the input positively affected the activation that we're talking up so [01:19:25] the activation that we're talking up so what we're going to do is that we're [01:19:26] what we're going to do is that we're just going to do a rail you we're just [01:19:28] just going to do a rail you we're just going to do a rally backward another [01:19:30] going to do a rally backward another reason is when we reconstruct we want to [01:19:33] reason is when we reconstruct we want to have the minimum influence from the [01:19:35] have the minimum influence from the forward propagation because we don't [01:19:37] forward propagation because we don't really want our reconstruction to depend [01:19:39] really want our reconstruction to depend on the forward propagation we would like [01:19:41] on the forward propagation we would like our reconstruction to be unbiased and [01:19:42] our reconstruction to be unbiased and just look at this activation reconstruct [01:19:44] just look at this activation reconstruct what happened so that's what you're [01:19:47] what happened so that's what you're going to use again this is a hack that [01:19:50] going to use again this is a hack that has been found through trial and error [01:19:52] has been found through trial and error and it's not going to be scientifically [01:19:56] and it's not going to be scientifically viable all the time okay so now we can [01:20:00] viable all the time okay so now we can do everything and we can reconstruct and [01:20:02] do everything and we can reconstruct and find out what was this activation [01:20:04] find out what was this activation corresponds to it took time to [01:20:06] corresponds to it took time to understand it but it's super fast to do [01:20:08] understand it but it's super fast to do now just one path not iterative we could [01:20:11] now just one path not iterative we could do it with every layer so let's say we [01:20:13] do it with every layer so let's say we do it with the first block of conv rail [01:20:16] do it with the first block of conv rail you max pool I go here I choose an [01:20:18] you max pool I go here I choose an activation I find the maximum activation [01:20:21] activation I find the maximum activation I set all the others to zero I [01:20:23] I set all the others to zero I unpolluted I come and I find out the [01:20:25] unpolluted I come and I find out the reconstruction this [01:20:27] reconstruction this via activation was looking at edges like [01:20:29] via activation was looking at edges like that so let's delve into the phone and [01:20:33] that so let's delve into the phone and see how we can visualize inside what's [01:20:37] see how we can visualize inside what's happening inside the network so all the [01:20:39] happening inside the network so all the visualization we're going to see now can [01:20:40] visualization we're going to see now can be found in Matthews dealers and Rob [01:20:42] be found in Matthews dealers and Rob fair uses paper visualizing [01:20:44] fair uses paper visualizing understanding convolutional networks I'm [01:20:46] understanding convolutional networks I'm going to explain what they correspond to [01:20:48] going to explain what they correspond to but check check out their papers if you [01:20:50] but check check out their papers if you want to understand more into details so [01:20:53] want to understand more into details so what happens here is that on the top [01:20:56] what happens here is that on the top left you have nine pictures these are [01:20:59] left you have nine pictures these are the crop pictures of the data set that [01:21:02] the crop pictures of the data set that activated the first filter of the first [01:21:04] activated the first filter of the first layer maximum so we have a first filter [01:21:09] layer maximum so we have a first filter on the first layer and we run all the [01:21:11] on the first layer and we run all the data sets and we recorded what are the [01:21:14] data sets and we recorded what are the main pictures that activate this filter [01:21:16] main pictures that activate this filter these were the main ones and we did the [01:21:18] these were the main ones and we did the same thing for all the filters of the [01:21:21] same thing for all the filters of the first layer and there are nine times [01:21:23] first layer and there are nine times nine of them there are a lot of them I [01:21:24] nine of them there are a lot of them I think in the bottom here you have the [01:21:29] think in the bottom here you have the filters which are the weights that were [01:21:32] filters which are the weights that were plotted just take the filter plot the [01:21:35] plotted just take the filter plot the weights this is doing this is important [01:21:37] weights this is doing this is important only for the first layer when you go [01:21:39] only for the first layer when you go deeper in your network the filter itself [01:21:41] deeper in your network the filter itself cannot be interpreted it's super hard to [01:21:43] cannot be interpreted it's super hard to understand it here because the weights [01:21:45] understand it here because the weights are directly multiplying the pixels the [01:21:48] are directly multiplying the pixels the first layer weights can be interpretable [01:21:50] first layer weights can be interpretable and in fact you see that the let's look [01:21:54] and in fact you see that the let's look at the third one the third filter here [01:21:56] at the third one the third filter here on the first row the third filter has [01:21:58] on the first row the third filter has weights that are kind of diagonal like [01:22:01] weights that are kind of diagonal like one of the diagonals and in fact if you [01:22:04] one of the diagonals and in fact if you look at the data that maximized these [01:22:07] look at the data that maximized these filters activation the feature map [01:22:10] filters activation the feature map corresponding to this filter they're all [01:22:11] corresponding to this filter they're all like cropped images that correspond to [01:22:14] like cropped images that correspond to diagonals that's what happens now the [01:22:17] diagonals that's what happens now the deeper we go the more fun we have so [01:22:19] deeper we go the more fun we have so let's go results on a validation set of [01:22:22] let's go results on a validation set of 50,000 images what's happened here is [01:22:25] 50,000 images what's happened here is they took 50,000 images [01:22:27] they took 50,000 images therefore propagated to the network they [01:22:29] therefore propagated to the network they recorded which image is the maximum the [01:22:33] recorded which image is the maximum the one that maximized the activation of the [01:22:36] one that maximized the activation of the feature map corresponding to the first [01:22:38] feature map corresponding to the first filter of layer two second [01:22:40] filter of layer two second filter and so on for all the filters [01:22:42] filter and so on for all the filters let's look at one of them we can see [01:22:45] let's look at one of them we can see that okay we have a circle on this one [01:22:47] that okay we have a circle on this one it means that this the filter general [01:22:50] it means that this the filter general which generated the feature map [01:22:51] which generated the feature map corresponding to this has been activated [01:22:56] corresponding to this has been activated through probably a wheel or something [01:22:57] through probably a wheel or something like that so the image of the wheel was [01:23:00] like that so the image of the wheel was the one that maximizes the activation of [01:23:02] the one that maximizes the activation of this one and then we use the Dickens [01:23:04] this one and then we use the Dickens method to reconstruct it any questions [01:23:07] method to reconstruct it any questions on that yeah good question what if the [01:23:16] on that yeah good question what if the activation function is not relevant in [01:23:18] activation function is not relevant in practice you would just use a backward [01:23:20] practice you would just use a backward to reconstruct if it's damaged you would [01:23:23] to reconstruct if it's damaged you would use the same the same type of method and [01:23:25] use the same the same type of method and you will try to approximate the [01:23:26] you will try to approximate the reconstruction okay let's go a little [01:23:31] reconstruction okay let's go a little deeper so now same layer two four [01:23:34] deeper so now same layer two four propagate all the images of the dataset [01:23:36] propagate all the images of the dataset find the nine images that are the [01:23:38] find the nine images that are the maximum activate that lead to the [01:23:40] maximum activate that lead to the maximum activation of the first filter [01:23:41] maximum activation of the first filter these are plotted on top here what you [01:23:45] these are plotted on top here what you can see is like for this filter that is [01:23:47] can see is like for this filter that is the sixth row first filter features are [01:23:51] the sixth row first filter features are more environment to small changes so [01:23:52] more environment to small changes so this filter actually was activated too [01:23:55] this filter actually was activated too many different types of circles spirals [01:23:57] many different types of circles spirals wheels and so it's it's still activated [01:24:00] wheels and so it's it's still activated although the circles were different size [01:24:04] although the circles were different size can go even deeper up third layer what's [01:24:08] can go even deeper up third layer what's interesting is that the deeper you go [01:24:09] interesting is that the deeper you go the more complexity you see so at the [01:24:11] the more complexity you see so at the beginning we're seeing only edges now we [01:24:13] beginning we're seeing only edges now we see much more complex figures you can [01:24:16] see much more complex figures you can see a face here in this in this entry it [01:24:20] see a face here in this in this entry it means that this filter activated for [01:24:23] means that this filter activated for when it's exist when it has seen a data [01:24:25] when it's exist when it has seen a data point that had this face then we were [01:24:27] point that had this face then we were constructed it cropped it on the face [01:24:28] constructed it cropped it on the face the face is kind of red it means that [01:24:31] the face is kind of red it means that the more red it was the more activation [01:24:34] the more red it was the more activation it led to and same top nine for layer [01:24:38] it led to and same top nine for layer tree so these are the nine images that [01:24:40] tree so these are the nine images that actually led to the face these are the [01:24:42] actually led to the face these are the nine images that maximize the at the [01:24:44] nine images that maximize the at the activation of the feature map [01:24:47] activation of the feature map corresponding to that filter and so on [01:24:50] corresponding to that filter and so on so here is a [01:25:11] you dishonor [01:25:56] normalization layers we can switch back [01:26:00] normalization layers we can switch back and forth between showing the actual [01:26:01] and forth between showing the actual activations and showing images [01:26:03] activations and showing images synthesized to produce high active [01:26:05] synthesized to produce high active easily he is giving his own image to the [01:26:07] easily he is giving his own image to the network right now but the time we get to [01:26:09] network right now but the time we get to the fifth convolutional layer the [01:26:11] the fifth convolutional layer the feeders being computed represent [01:26:13] feeders being computed represent abstract concepts so these are the green [01:26:15] abstract concepts so these are the green that after example in Italy this neuron [01:26:17] that after example in Italy this neuron seems to respond to phases we can [01:26:19] seems to respond to phases we can further investigate this neuron by [01:26:20] further investigate this neuron by showing a few different types of [01:26:22] showing a few different types of information first we can artificially [01:26:24] information first we can artificially create optimized images using new [01:26:26] create optimized images using new regularization techniques initially the [01:26:28] regularization techniques initially the one we thought the bus needs that [01:26:29] one we thought the bus needs that simulation showing his neuron fires in [01:26:31] simulation showing his neuron fires in response to a face and show this one is [01:26:33] response to a face and show this one is that they also that the images are [01:26:34] that they also that the images are training set to activate this neuron the [01:26:36] training set to activate this neuron the most as well as pixels from those images [01:26:38] most as well as pixels from those images most responsible for the high [01:26:40] most responsible for the high activations computed via the [01:26:41] activations computed via the deconvolution DC that the convolution [01:26:43] deconvolution DC that the convolution rings feature response to multiple faces [01:26:45] rings feature response to multiple faces in different locations and by looking at [01:26:48] in different locations and by looking at the decon we can see that it would [01:26:51] the decon we can see that it would respond more strongly if we had even [01:26:52] respond more strongly if we had even darker eyes and rosy lips we can also [01:26:55] darker eyes and rosy lips we can also confirm that it carries about the head [01:26:56] confirm that it carries about the head and shoulders that ignores the arms and [01:26:59] and shoulders that ignores the arms and torso we can even see that it fires to [01:27:02] torso we can even see that it fires to some extent for cat faces using back [01:27:05] some extent for cat faces using back prop or decon we can see that this unit [01:27:08] prop or decon we can see that this unit depends most strongly on a couple units [01:27:09] depends most strongly on a couple units in the previous layer contour and not [01:27:12] in the previous layer contour and not about a dozen or so in conservation try [01:27:14] about a dozen or so in conservation try to track by where it's look at another [01:27:16] to track by where it's look at another marriage neural net so what is this unit [01:27:18] marriage neural net so what is this unit doing from the top nine images we might [01:27:21] doing from the top nine images we might conclude that it fires four different [01:27:22] conclude that it fires four different types of clothing but examining the [01:27:25] types of clothing but examining the synthetic images shows that it may be [01:27:26] synthetic images shows that it may be detecting not clothing per se but [01:27:28] detecting not clothing per se but wrinkles in the live plot we can see [01:27:31] wrinkles in the live plot we can see that it's activated [01:27:32] that it's activated my shirt and smoothing out half of my [01:27:34] my shirt and smoothing out half of my shirt causes that hack with the [01:27:36] shirt causes that hack with the activations to decrease finally here's [01:27:40] activations to decrease finally here's another interesting neuron this one has [01:27:43] another interesting neuron this one has learned to look for printed text in a [01:27:44] learned to look for printed text in a variety of sizes colors and fonts this [01:27:48] variety of sizes colors and fonts this is pretty cool [01:27:49] is pretty cool because we never asked the network to [01:27:51] because we never asked the network to look for wrinkles or text or faces the [01:27:53] look for wrinkles or text or faces the only papers we provided were at the very [01:27:55] only papers we provided were at the very last layer so the only reason the [01:27:57] last layer so the only reason the network learned features like text and [01:27:58] network learned features like text and faces in the middle was to support final [01:28:00] faces in the middle was to support final decisions at that last layer for example [01:28:03] decisions at that last layer for example the text detector may provide good [01:28:05] the text detector may provide good evidence that a rectangle is in fact a [01:28:08] evidence that a rectangle is in fact a book seen on edge and detecting many [01:28:10] book seen on edge and detecting many books next to each other might be a good [01:28:12] books next to each other might be a good way of detecting a bookcase which was [01:28:14] way of detecting a bookcase which was one of the categories we trained the net [01:28:15] one of the categories we trained the net to recognize in this video we've shown [01:28:19] to recognize in this video we've shown some of the features of the deep list [01:28:20] some of the features of the deep list toolbox and a few of the things we've [01:28:22] toolbox and a few of the things we've learned by using it you can download it [01:28:24] learned by using it you can download it yeah so they have a toolbox which is [01:28:27] yeah so they have a toolbox which is exactly what you need right here and you [01:28:29] exactly what you need right here and you could test the toolbox on your model [01:28:32] could test the toolbox on your model takes time to get get it to run but but [01:28:35] takes time to get get it to run but but if you want to visualize all the neurons [01:28:37] if you want to visualize all the neurons it's very helpful okay so let's go [01:28:41] it's very helpful okay so let's go quickly we'll spend about three minutes [01:28:43] quickly we'll spend about three minutes on the optional deep dream one cause [01:28:45] on the optional deep dream one cause it's fun and yeah feel free free to jump [01:28:49] it's fun and yeah feel free free to jump in and ask questions so the Google and [01:28:56] in and ask questions so the Google and the page the blog post is by Alexander [01:29:00] the page the blog post is by Alexander morte Vince F the idea here is to [01:29:02] morte Vince F the idea here is to generate art using this knowledge of [01:29:04] generate art using this knowledge of visualization and how they do that is [01:29:07] visualization and how they do that is quite interesting then we take an input [01:29:10] quite interesting then we take an input for propagated to the network and I took [01:29:14] for propagated to the network and I took specs to declare that we called the [01:29:16] specs to declare that we called the Dreamliner then we'll take the [01:29:19] Dreamliner then we'll take the activation and set the gradient to be [01:29:21] activation and set the gradient to be equal to these activations the gradient [01:29:24] equal to these activations the gradient at this layer and then back propagate [01:29:25] at this layer and then back propagate the gradient uniqua so earlier what we [01:29:29] the gradient uniqua so earlier what we do is that we define the new objective [01:29:30] do is that we define the new objective function that was equal to an activation [01:29:33] function that was equal to an activation and we try to maximize its objective [01:29:35] and we try to maximize its objective function who they doing it even stronger [01:29:37] function who they doing it even stronger then you take the activations and they [01:29:40] then you take the activations and they the gradients to be equal to the [01:29:41] the gradients to be equal to the activations and so the stronger the [01:29:43] activations and so the stronger the activation the stronger is going to [01:29:45] activation the stronger is going to become later on and so on and so on and [01:29:48] become later on and so on and so on and so on so they're trying to see what the [01:29:50] so on so they're trying to see what the network is activating for and increase [01:29:53] network is activating for and increase even this activation so for propagate [01:29:57] even this activation so for propagate the image set the gradient of the [01:29:58] the image set the gradient of the dreaming layer to be code to [01:30:00] dreaming layer to be code to exaggeration but back propagate all the [01:30:02] exaggeration but back propagate all the way back to the inputs and update the [01:30:04] way back to the inputs and update the pixel of the image do that several time [01:30:06] pixel of the image do that several time and every time the activations will [01:30:08] and every time the activations will change so you have to set again the new [01:30:10] change so you have to set again the new activations to be the the gradients of [01:30:13] activations to be the the gradients of the green layer and back propagate and [01:30:15] the green layer and back propagate and also makes it you would see things [01:30:16] also makes it you would see things happening so it's hard to see here on [01:30:18] happening so it's hard to see here on the screen but you would have a pig [01:30:20] the screen but you would have a pig appearing here you'd have like a tree [01:30:23] appearing here you'd have like a tree somewhere there and some animals and a [01:30:25] somewhere there and some animals and a lot of animals are going to start [01:30:26] lot of animals are going to start appearing in this cloud it's interesting [01:30:29] appearing in this cloud it's interesting because it means let's say you see this [01:30:31] because it means let's say you see this cloud here if the network thought that [01:30:34] cloud here if the network thought that this cloud looked a little bit like [01:30:36] this cloud looked a little bit like regard so one of the the the feature [01:30:39] regard so one of the the the feature maps was which would be generated by the [01:30:42] maps was which would be generated by the filter that the textile would activate [01:30:44] filter that the textile would activate itself a little bit because we set the [01:30:47] itself a little bit because we set the gradient to be equal to the activation [01:30:48] gradient to be equal to the activation is going to increase the appearance of [01:30:52] is going to increase the appearance of the dog in the image and so on and then [01:30:55] the dog in the image and so on and then you will see a dog appearing after a few [01:30:56] you will see a dog appearing after a few iterations it's quite fun and if you [01:30:59] iterations it's quite fun and if you zoom you see that type of thing so you [01:31:01] zoom you see that type of thing so you see a big snail it's kind of a pig with [01:31:04] see a big snail it's kind of a pig with a snail carapace camel bird dog dogfish [01:31:08] a snail carapace camel bird dog dogfish I advise you to like look at this on the [01:31:11] I advise you to like look at this on the slides rather than on the screen but [01:31:13] slides rather than on the screen but it's quite fine and same like if you [01:31:16] it's quite fine and same like if you give that type of image you would see [01:31:18] give that type of image you would see that because the network thought there [01:31:20] that because the network thought there was like a tower a little bit you will [01:31:23] was like a tower a little bit you will increase the networks confidence in the [01:31:25] increase the networks confidence in the fact that there is a tower by changing [01:31:26] fact that there is a tower by changing the image and the tower will come out [01:31:28] the image and the tower will come out and so on it's quite a cool yeah and if [01:31:34] and so on it's quite a cool yeah and if you dream in lower layers obviously you [01:31:37] you dream in lower layers obviously you will see edges happening or patterns [01:31:39] will see edges happening or patterns coming because the lower layers seem to [01:31:43] coming because the lower layers seem to detect an edge and then you will [01:31:45] detect an edge and then you will increase its confidence in its edge so [01:31:47] increase its confidence in its edge so it we between create an edge on the [01:31:49] it we between create an edge on the image these are fine [01:31:53] deep dream on video [01:31:56] deep dream on video [Music] [01:32:19] what's funny [01:32:22] [Music] [01:32:26] [Music] [Applause] [01:32:32] get some trippy on the side so one one [01:32:39] get some trippy on the side so one one inside that is fun about it is if the [01:32:42] inside that is fun about it is if the network and this is not only for D dream [01:32:44] network and this is not only for D dream it's also its most default gradient [01:32:46] it's also its most default gradient assets let's say we have an output score [01:32:48] assets let's say we have an output score of the dumbbell and we define our [01:32:52] of the dumbbell and we define our objective function to be the dumbbell [01:32:53] objective function to be the dumbbell score and we try to find image that [01:32:56] score and we try to find image that maximizes the dumbbell we will see [01:32:58] maximizes the dumbbell we will see something like that the interesting is [01:33:00] something like that the interesting is that the network thinks that the [01:33:02] that the network thinks that the dumbbell is a hand with a dumbbell [01:33:05] dumbbell is a hand with a dumbbell not only the number and you can see it [01:33:08] not only the number and you can see it here you see the hands and the reason is [01:33:10] here you see the hands and the reason is it has never seen a dumbbell alone so [01:33:12] it has never seen a dumbbell alone so probably image that there is no picture [01:33:14] probably image that there is no picture of a dumbbell alone in a corner and [01:33:16] of a dumbbell alone in a corner and leave all that samba but instead it's [01:33:20] leave all that samba but instead it's usually a human triangle with fire [01:33:23] usually a human triangle with fire okay so just to summarize what we've [01:33:27] okay so just to summarize what we've learned today we are now able to answer [01:33:30] learned today we are now able to answer all the following questions [01:33:32] all the following questions what part of the best way to go what is [01:33:38] what part of the best way to go what is the role of a given neuron feature layer [01:33:40] the role of a given neuron feature layer become whoa reconstruct search in the [01:33:43] become whoa reconstruct search in the data set what are the top images and who [01:33:45] data set what are the top images and who gradient ascent check can we check what [01:33:48] gradient ascent check can we check what the network focus is on occasion [01:33:50] the network focus is on occasion sensitivity saliency map class [01:33:51] sensitivity saliency map class activation maps how does the network see [01:33:54] activation maps how does the network see our world I would say gradient descent [01:33:56] our world I would say gradient descent maybe deep drains of cool stuff and then [01:33:58] maybe deep drains of cool stuff and then what are the implications and use cases [01:34:02] what are the implications and use cases of these visualizations you can use [01:34:06] of these visualizations you can use segment C mapped assignments not very [01:34:08] segment C mapped assignments not very useful given the new methods we have but [01:34:10] useful given the new methods we have but the convolution that we've seen together [01:34:11] the convolution that we've seen together is widely used for segmentation and [01:34:13] is widely used for segmentation and reconstruction also for generative a [01:34:16] reconstruction also for generative a virtual networks to generate images and [01:34:18] virtual networks to generate images and parts sometimes these visualization are [01:34:22] parts sometimes these visualization are also helpful to detect if some of the [01:34:25] also helpful to detect if some of the neurons in your network are dead so [01:34:27] neurons in your network are dead so let's then you have a network and you [01:34:28] let's then you have a network and you use the tool box and you see that [01:34:30] use the tool box and you see that whatever the input image you give some [01:34:32] whatever the input image you give some feature maps or always dark it means [01:34:35] feature maps or always dark it means that the feature that generated the [01:34:38] that the feature that generated the feature map icon holding over the inputs [01:34:39] feature map icon holding over the inputs probably never detected anything so it's [01:34:42] probably never detected anything so it's not being even trained that's the type [01:34:44] not being even trained that's the type of [01:34:44] of like you can get okay thanks guys sorry [01:34:48] like you can get okay thanks guys sorry we'll wipe over time ================================================================================ LECTURE 008 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 - Career Advice / Reading Research Papers Source: https://www.youtube.com/watch?v=733m6qBH-jI --- Transcript [00:00:05] okay everyone the cigar on so as usual [00:00:10] okay everyone the cigar on so as usual if you have not yet please enter [00:00:14] if you have not yet please enter su ID so that we know you're here in [00:00:17] su ID so that we know you're here in this room [00:00:19] this room so computing me okay at the back this is [00:00:22] so computing me okay at the back this is okay oh yes is the volume okay at the [00:00:25] okay oh yes is the volume okay at the back all right no one's responding yes [00:00:29] back all right no one's responding yes okay all right thank you so what I want [00:00:33] okay all right thank you so what I want to do today is share a few two things [00:00:37] to do today is share a few two things you know we're approaching the in the [00:00:39] you know we're approaching the in the courts our hope you guys looking forward [00:00:41] courts our hope you guys looking forward to the Thanksgiving break next week [00:00:44] to the Thanksgiving break next week action and I guess we're all home [00:00:46] action and I guess we're all home viewers but those us those do that [00:00:48] viewers but those us those do that viewing this from outside California [00:00:49] viewing this from outside California know that we're all feeling really bad [00:00:51] know that we're all feeling really bad air here in California hope there's some [00:00:53] air here in California hope there's some of your Washington's at home you have [00:00:54] of your Washington's at home you have better air wherever you are but what I [00:00:59] better air wherever you are but what I hope to do today is give you some advice [00:01:02] hope to do today is give you some advice that will set you up for the future sort [00:01:05] that will set you up for the future sort of even beyond the conclusion of cs2 30 [00:01:08] of even beyond the conclusion of cs2 30 and in particular what I want to do [00:01:10] and in particular what I want to do today is share view some advice on how [00:01:13] today is share view some advice on how to read research papers because you know [00:01:16] to read research papers because you know deep learning is evolving fast enough [00:01:18] deep learning is evolving fast enough that even though you've learned a lot of [00:01:20] that even though you've learned a lot of foundations of deep learning and learned [00:01:22] foundations of deep learning and learned about the tips and tricks and Kari know [00:01:24] about the tips and tricks and Kari know better than many practitioners how to [00:01:26] better than many practitioners how to actually get deep learning algorithms to [00:01:27] actually get deep learning algorithms to work already when you're working on [00:01:31] work already when you're working on specific applications whether in [00:01:33] specific applications whether in computer vision or national processing [00:01:34] computer vision or national processing or speech recognition or something else [00:01:36] or speech recognition or something else for you to be able to efficiently figure [00:01:39] for you to be able to efficiently figure out the academic literature on key parts [00:01:42] out the academic literature on key parts of the deep learning world will help you [00:01:44] of the deep learning world will help you keep on developing and you know staying [00:01:46] keep on developing and you know staying on top of ideas even as they evolve for [00:01:48] on top of ideas even as they evolve for the next several years or maybe decades [00:01:51] the next several years or maybe decades so first thing I want to do is um give [00:01:53] so first thing I want to do is um give you advice on how when say when I'm [00:01:56] you advice on how when say when I'm trying to master a new body of [00:01:58] trying to master a new body of literature how I go about that and hope [00:02:00] literature how I go about that and hope that those techniques will be useful to [00:02:02] that those techniques will be useful to help you be more efficient how you read [00:02:04] help you be more efficient how you read research papers and then the second [00:02:06] research papers and then the second thing is in previous offerings of this [00:02:09] thing is in previous offerings of this course one request from a lot of [00:02:11] course one request from a lot of students was just advice to navigating a [00:02:14] students was just advice to navigating a career in machine learning and so in the [00:02:16] career in machine learning and so in the second half of today I want to share [00:02:18] second half of today I want to share some thoughts with you on that okay so [00:02:23] some thoughts with you on that okay so it turns out that [00:02:26] it turns out that so I guess two topics meeting research [00:02:29] so I guess two topics meeting research papers right and and then second career [00:02:32] papers right and and then second career bison machine learning so it turns out [00:02:35] bison machine learning so it turns out that you know reading research papers is [00:02:43] that you know reading research papers is one of those things that a lot of PhD [00:02:44] one of those things that a lot of PhD students learn by osmosis right meaning [00:02:47] students learn by osmosis right meaning that if you're a PhD student and you see [00:02:49] that if you're a PhD student and you see you know a few professors will see other [00:02:51] you know a few professors will see other PhD students do certain things that you [00:02:53] PhD students do certain things that you might try to pick it up by osmosis but I [00:02:55] might try to pick it up by osmosis but I hope today to accelerate your efficiency [00:02:57] hope today to accelerate your efficiency and how you acquire knowledge yourself [00:03:00] and how you acquire knowledge yourself from the UM from the academic literature [00:03:03] from the UM from the academic literature right and so um let's say that there's [00:03:05] right and so um let's say that there's an area you want to become good at let's [00:03:07] an area you want to become good at let's say you want to build that speech [00:03:09] say you want to build that speech recognition or analysis terms is all for [00:03:12] recognition or analysis terms is all for now let's see what to build that speech [00:03:16] now let's see what to build that speech recognition system that we talked about [00:03:17] recognition system that we talked about with the robber turn-on in the desk lamp [00:03:19] with the robber turn-on in the desk lamp right this is what I read there's a [00:03:22] right this is what I read there's a sequence of steps I recommend you take [00:03:24] sequence of steps I recommend you take which is first compiled lists of papers [00:03:33] which is first compiled lists of papers and and buy papers I mean both research [00:03:37] and and buy papers I mean both research papers often posted on archive or on TV [00:03:39] papers often posted on archive or on TV and Internet but also plus medium pose [00:03:46] yeah well maybe some occasional github [00:03:49] yeah well maybe some occasional github poses oh those are rare but whatever [00:03:52] poses oh those are rare but whatever texts or learning resources you have and [00:03:54] texts or learning resources you have and then what I usually do is end up [00:03:58] then what I usually do is end up skipping around in this so if I'm trying [00:04:02] skipping around in this so if I'm trying to master a new body of knowledge so you [00:04:04] to master a new body of knowledge so you will learn about speech recognition [00:04:05] will learn about speech recognition systems this is what it feels like to [00:04:07] systems this is what it feels like to read a set of papers which is maybe you [00:04:09] read a set of papers which is maybe you initially start off with five papers and [00:04:13] initially start off with five papers and if on the horizontal axis I plot you [00:04:16] if on the horizontal axis I plot you know 0% to 100% read / understood right [00:04:24] know 0% to 100% read / understood right the way it feels like reading these [00:04:26] the way it feels like reading these papers is often oh we you know 10% of [00:04:31] papers is often oh we you know 10% of each paper or try to quickly skim and [00:04:33] each paper or try to quickly skim and understand each of these papers and if [00:04:36] understand each of these papers and if based on that you decide that paper [00:04:38] based on that you decide that paper Mateusz it died right other other other [00:04:40] Mateusz it died right other other other authors even cited and say boy they sure [00:04:43] authors even cited and say boy they sure got it wrong when you read it it just [00:04:44] got it wrong when you read it it just doesn't make sense then go ahead and [00:04:46] doesn't make sense then go ahead and forget it and as you skip around to [00:04:49] forget it and as you skip around to different papers you might decide that [00:04:51] different papers you might decide that paper three is the really seminal one [00:04:53] paper three is the really seminal one and then spend a lot of time to go ahead [00:04:57] and then spend a lot of time to go ahead and read and understand the whole thing [00:04:58] and read and understand the whole thing and based on that you might then find a [00:05:01] and based on that you might then find a sixth paper from the citations and read [00:05:03] sixth paper from the citations and read that and go back and flesh sure you [00:05:06] that and go back and flesh sure you understand your paper for and then find [00:05:08] understand your paper for and then find a paper seven and go and read that all [00:05:10] a paper seven and go and read that all the way to the conclusion but this is [00:05:13] the way to the conclusion but this is what it feels like as you you know [00:05:15] what it feels like as you you know assemble a list of papers and skip [00:05:16] assemble a list of papers and skip around and try to master a body of [00:05:20] around and try to master a body of literature right around some topic that [00:05:22] literature right around some topic that you want to learn and I think um some [00:05:25] you want to learn and I think um some rough guidelines you know if you read [00:05:28] rough guidelines you know if you read fifty to twenty papers I think you have [00:05:30] fifty to twenty papers I think you have a basic understanding of an area right [00:05:32] a basic understanding of an area right may be good enough to do some work apply [00:05:35] may be good enough to do some work apply some algorithms if you read 50 to 100 [00:05:39] some algorithms if you read 50 to 100 papers in an area and they speech [00:05:41] papers in an area and they speech recognition and and kind of understand a [00:05:43] recognition and and kind of understand a lot of it then that's pretty enough to [00:05:45] lot of it then that's pretty enough to give you a very good understanding of an [00:05:47] give you a very good understanding of an area right you you might I don't know [00:05:49] area right you you might I don't know I'm always careful about when I say you [00:05:51] I'm always careful about when I say you know you're mastering a subject but you [00:05:53] know you're mastering a subject but you read fifty a hundred papers on speech [00:05:54] read fifty a hundred papers on speech recognition you have a very good [00:05:56] recognition you have a very good understanding of speech recognition or [00:05:58] understanding of speech recognition or if you're interested in say domain [00:05:59] if you're interested in say domain adaptation right by the time you've read [00:06:01] adaptation right by the time you've read fifty or hundred papers you have a very [00:06:03] fifty or hundred papers you have a very good understanding of a subject like [00:06:05] good understanding of a subject like that [00:06:05] that but the three five to twenty papers it's [00:06:07] but the three five to twenty papers it's probably enough for you to implement it [00:06:09] probably enough for you to implement it but maybe not not sure if it's enough [00:06:11] but maybe not not sure if it's enough for you to do research or be really at [00:06:12] for you to do research or be really at the cutting edge but these are maybe [00:06:14] the cutting edge but these are maybe some guidelines for the volume of [00:06:16] some guidelines for the volume of meeting you should aspire to if you want [00:06:18] meeting you should aspire to if you want to pick up a new area I'll take one of [00:06:20] to pick up a new area I'll take one of subjects in CS 230 and go more deeply [00:06:22] subjects in CS 230 and go more deeply into it um know how do you read one [00:06:31] into it um know how do you read one paper and um I hope most of you brought [00:06:34] paper and um I hope most of you brought your laptops so what I'm going to do is [00:06:36] your laptops so what I'm going to do is describe to you how I read one paper and [00:06:39] describe to you how I read one paper and then after that I'm just going to ask [00:06:41] then after that I'm just going to ask all of you to you know download the [00:06:43] all of you to you know download the paper online and just take I don't know [00:06:46] paper online and just take I don't know take take a few minutes to read a paper [00:06:48] take take a few minutes to read a paper right here in class and see how far you [00:06:50] right here in class and see how far you can get understanding a research paper [00:06:53] can get understanding a research paper in just minutes right right here in cost [00:06:55] in just minutes right right here in cost okay um so when reading one paper so the [00:07:00] okay um so when reading one paper so the bad way to read a paper is to go from [00:07:03] bad way to read a paper is to go from the first word until the last word right [00:07:05] the first word until the last word right this is a bad way to when you have a [00:07:07] this is a bad way to when you have a paper like this oh and by the way [00:07:08] paper like this oh and by the way actually here and tell you what my real [00:07:10] actually here and tell you what my real life is like [00:07:11] life is like so I actually pretty much everywhere I [00:07:15] so I actually pretty much everywhere I go whenever i backpack this is my actual [00:07:18] go whenever i backpack this is my actual folder I don't wanna show this is my [00:07:21] folder I don't wanna show this is my actual folder of unread paper so pretty [00:07:24] actual folder of unread paper so pretty much everywhere I go I actually have a [00:07:26] much everywhere I go I actually have a paper yeah a stack of papers is on my [00:07:28] paper yeah a stack of papers is on my personal reading list [00:07:30] personal reading list this is actually my real life I didn't [00:07:32] this is actually my real life I didn't bring this to show you this is in my [00:07:33] bring this to show you this is in my backpack all the time [00:07:34] backpack all the time and I think that I don't know these days [00:07:37] and I think that I don't know these days on my team at landing a on deep-lined [00:07:40] on my team at landing a on deep-lined Rai I personally lead in reading group [00:07:42] Rai I personally lead in reading group where I lead a discussion about two [00:07:43] where I lead a discussion about two papers a week but to select two papers [00:07:46] papers a week but to select two papers it means I need to read like five or six [00:07:47] it means I need to read like five or six papers a week to select to you know to [00:07:50] papers a week to select to you know to present or discuss at the land area and [00:07:52] present or discuss at the land area and deeper into a meeting room so this is my [00:07:54] deeper into a meeting room so this is my room life right and how I try to stay on [00:07:56] room life right and how I try to stay on top of the literature and and I'm doing [00:07:58] top of the literature and and I'm doing a lot if I can find this fine if I can [00:08:00] a lot if I can find this fine if I can find the time to read you know a couple [00:08:01] find the time to read you know a couple papers of me hopefully all of you can [00:08:03] papers of me hopefully all of you can too but when I'm reading a paper this is [00:08:07] too but when I'm reading a paper this is this is how I would recommend you go [00:08:09] this is how I would recommend you go about it which is the don't go from the [00:08:11] about it which is the don't go from the first or and then read until the last [00:08:12] first or and then read until the last word um instead take multiple passes [00:08:16] word um instead take multiple passes through the paper right um and so you [00:08:23] through the paper right um and so you know step one is uh read the title the [00:08:34] know step one is uh read the title the abstract and also the figures especially [00:08:40] abstract and also the figures especially in deep learning there are a lot of [00:08:42] in deep learning there are a lot of research papers where so the entire [00:08:44] research papers where so the entire paper is summarized in one or two [00:08:46] paper is summarized in one or two figures in the figure captions so so [00:08:50] figures in the figure captions so so sometimes just by reading the title [00:08:51] sometimes just by reading the title abstract and you know the key neural [00:08:54] abstract and you know the key neural network architecture figure that just [00:08:55] network architecture figure that just describes what the whole papers are and [00:08:57] describes what the whole papers are and maybe one or two of the experiments that [00:08:59] maybe one or two of the experiments that you can sometimes get a very good sense [00:09:01] you can sometimes get a very good sense of what the whole paper is about without [00:09:03] of what the whole paper is about without you know hardly reading any of the text [00:09:05] you know hardly reading any of the text in the paper itself right that's the [00:09:07] in the paper itself right that's the first pass second pause I would tend to [00:09:10] first pass second pause I would tend to read more carefully the intro the [00:09:16] read more carefully the intro the conclusions look carefully at all the [00:09:20] conclusions look carefully at all the figures again and then skim the rest and [00:09:29] figures again and then skim the rest and you know I don't know how many of you [00:09:31] you know I don't know how many of you have published academic papers but when [00:09:34] have published academic papers but when people publish academic papers part of [00:09:37] people publish academic papers part of you know the publication process is [00:09:39] you know the publication process is convincing the reviewers that your paper [00:09:41] convincing the reviewers that your paper is worthy for acceptance and so what you [00:09:44] is worthy for acceptance and so what you find is that the abstract entering [00:09:46] find is that the abstract entering conclusion is often when the authors try [00:09:49] conclusion is often when the authors try to summarize that weren't really really [00:09:50] to summarize that weren't really really carefully to make a case to make a [00:09:53] carefully to make a case to make a really clear case to the review as for [00:09:55] really clear case to the review as for why you know they think their paper [00:09:56] why you know they think their paper should be accepted for publication and [00:09:58] should be accepted for publication and so because of that you know maybe it's [00:10:01] so because of that you know maybe it's slightly not slightly unusual incentive [00:10:03] slightly not slightly unusual incentive the intro and conclusion and after I [00:10:05] the intro and conclusion and after I often give a very clear summary of [00:10:07] often give a very clear summary of what's the paper actually about and [00:10:11] what's the paper actually about and depending on the game just be you know [00:10:20] depending on the game just be you know bluntly honest with you guys the related [00:10:23] bluntly honest with you guys the related work section is useful if you want [00:10:25] work section is useful if you want sometimes useful you want to do [00:10:28] sometimes useful you want to do understand related work and figure out [00:10:30] understand related work and figure out what's what are the most important works [00:10:31] what's what are the most important works in the papers but the first time you [00:10:33] in the papers but the first time you read this you might skim or even skip [00:10:35] read this you might skim or even skip skim the related work section it turns [00:10:37] skim the related work section it turns out unless you're really familiar [00:10:38] out unless you're really familiar literature if this is a body of not work [00:10:41] literature if this is a body of not work that you're not that familiar with the [00:10:42] that you're not that familiar with the related work section is sometimes almost [00:10:44] related work section is sometimes almost impossible to understand and again since [00:10:47] impossible to understand and again since I'm being very honest with you guys [00:10:48] I'm being very honest with you guys sometimes the related work section is [00:10:50] sometimes the related work section is when the author's try to cite everyone [00:10:52] when the author's try to cite everyone that could possibly be reviewing the [00:10:53] that could possibly be reviewing the paper and to make them feel good and [00:10:56] paper and to make them feel good and then hopefully accept the paper office [00:10:57] then hopefully accept the paper office of related work sections or sometimes [00:10:59] of related work sections or sometimes written in funny ways right [00:11:02] written in funny ways right and then set three I often read the [00:11:07] and then set three I often read the paper but um just skip the math for beat [00:11:22] paper but um just skip the math for beat the whole thing but skip pass it don't [00:11:26] the whole thing but skip pass it don't make sense [00:11:37] you know I think that one thing this [00:11:44] you know I think that one thing this happened many times in the research is [00:11:46] happened many times in the research is that I mean the papers we tend to be [00:11:48] that I mean the papers we tend to be cutting edge research and so when we [00:11:51] cutting edge research and so when we publish things we sometimes don't know [00:11:54] publish things we sometimes don't know what's really important and what's not [00:11:55] what's really important and what's not important right so there there are many [00:11:58] important right so there there are many examples of well-known highly cited [00:12:01] examples of well-known highly cited research papers whereas some of it was [00:12:03] research papers whereas some of it was just great stuff and some of it you know [00:12:05] just great stuff and some of it you know turned out to be unimportant but at the [00:12:07] turned out to be unimportant but at the time the paper was written the authors [00:12:09] time the paper was written the authors did not know every no one on the planet [00:12:10] did not know every no one on the planet knew what was important that what was [00:12:12] knew what was important that what was not important and maybe one example the [00:12:15] not important and maybe one example the Lynette five paper write seminal paper [00:12:17] Lynette five paper write seminal paper by Yamaka and part of it was phenomenal [00:12:20] by Yamaka and part of it was phenomenal just established a lot of the [00:12:21] just established a lot of the foundations of confidence and so is it [00:12:24] foundations of confidence and so is it one of the most incredibly influential [00:12:25] one of the most incredibly influential papers but you go back and read that [00:12:26] papers but you go back and read that paper and another sort of whole half of [00:12:29] paper and another sort of whole half of the paper was about other stuff right [00:12:30] the paper was about other stuff right transducers and so on then it's much [00:12:32] transducers and so on then it's much less used and so and so it's fine if you [00:12:35] less used and so and so it's fine if you read a paper and some of it doesn't make [00:12:37] read a paper and some of it doesn't make sense because it's not that unusual or [00:12:38] sense because it's not that unusual or sometimes it just happens that great [00:12:41] sometimes it just happens that great research means we're publishing things [00:12:42] research means we're publishing things at the boundaries of our knowledge and [00:12:44] at the boundaries of our knowledge and sometimes the stuff you see you know [00:12:47] sometimes the stuff you see you know will realize five years in the future [00:12:49] will realize five years in the future that that wasn't the most important [00:12:50] that that wasn't the most important thing after all right all that what was [00:12:52] thing after all right all that what was the key part of the algorithm maybe [00:12:54] the key part of the algorithm maybe wasn't what you office thought so [00:12:55] wasn't what you office thought so sometimes it passed paper don't make [00:12:57] sometimes it passed paper don't make sense [00:12:57] sense it's okay to skim it initially and move [00:13:00] it's okay to skim it initially and move on [00:13:01] on great unless you're trying to do a peek [00:13:02] great unless you're trying to do a peek unless you're trying to do deep research [00:13:04] unless you're trying to do deep research and really need to master it then go [00:13:06] and really need to master it then go ahead and spend more time but they're [00:13:07] ahead and spend more time but they're trying to get through a lot of papers [00:13:08] trying to get through a lot of papers then [00:13:09] then you know then then it's just [00:13:11] you know then then it's just prioritizing your time okay and so just [00:13:17] prioritizing your time okay and so just a few last things and then I'll ask you [00:13:21] a few last things and then I'll ask you to practice this yourself with a paper [00:13:24] to practice this yourself with a paper right um you know I think that when [00:13:27] right um you know I think that when you've read and understood the paper um [00:13:31] these are questions to try to keep in [00:13:34] these are questions to try to keep in mind and when you read a paper in a few [00:13:35] mind and when you read a paper in a few minutes maybe and try to answer these [00:13:37] minutes maybe and try to answer these questions whether you also strive to [00:13:39] questions whether you also strive to accomplish and what I hope to do in a [00:13:43] accomplish and what I hope to do in a few minutes is asked you to download the [00:13:45] few minutes is asked you to download the paper off the internet read it and then [00:13:48] paper off the internet read it and then try to answer these questions and [00:13:49] try to answer these questions and discuss your answer to these questions [00:13:51] discuss your answer to these questions with work with your peers write with [00:13:53] with work with your peers write with others in the class what were the key [00:14:00] others in the class what were the key elements what can you use yourself and [00:14:22] elements what can you use yourself and um okay so I think if you can answer [00:14:45] um okay so I think if you can answer these questions hopefully that will [00:14:48] these questions hopefully that will reflect that you have a pretty good [00:14:49] reflect that you have a pretty good understanding of the paper okay and so [00:14:53] understanding of the paper okay and so what I would like you to do is pull your [00:14:57] what I would like you to do is pull your laptop and then so yo there's a chip so [00:15:01] laptop and then so yo there's a chip so I think on the confident videos right on [00:15:03] I think on the confident videos right on the developer I can't have videos on [00:15:07] the developer I can't have videos on Coursera you learn the bed about well [00:15:10] Coursera you learn the bed about well various neural network architecture is [00:15:12] various neural network architecture is absurd resonates and it turns out that [00:15:14] absurd resonates and it turns out that there's another follow-on piece of work [00:15:17] there's another follow-on piece of work that maybe builds on some of the ideas [00:15:18] that maybe builds on some of the ideas of resinous which is called dense net [00:15:21] of resinous which is called dense net so what I'd like you to do is oh and so [00:15:24] so what I'd like you to do is oh and so wonder that kids do is actually try this [00:15:26] wonder that kids do is actually try this and when I'm reading a paper again in [00:15:29] and when I'm reading a paper again in the earliest stages don't get stuck on [00:15:30] the earliest stages don't get stuck on the mouth just go ahead and skim the map [00:15:32] the mouth just go ahead and skim the map and read the English text we can get [00:15:33] and read the English text we can get through faster and maybe one of the [00:15:36] through faster and maybe one of the principles is go from the very efficient [00:15:38] principles is go from the very efficient high information content for us and then [00:15:40] high information content for us and then go to the harder material later [00:15:42] go to the harder material later remember it's why often I'll just skim [00:15:44] remember it's why often I'll just skim the map and I don't if I don't [00:15:45] the map and I don't if I don't understand the similar integration just [00:15:47] understand the similar integration just move on and then the only data go back [00:15:48] move on and then the only data go back and and really try to figure out the map [00:15:50] and and really try to figure out the map more careful okay [00:15:51] more careful okay so what I'd like you to do is take our [00:15:54] so what I'd like you to do is take our which take on wonderful let's let's try [00:16:00] which take on wonderful let's let's try it listen let's have you takes seven [00:16:02] it listen let's have you takes seven minutes where I'm thinking maybe one one [00:16:04] minutes where I'm thinking maybe one one minute per page is quite fast and search [00:16:08] minute per page is quite fast and search for this paper densely connected [00:16:12] for this paper densely connected convolutional eunuch networks by up [00:16:24] that's all okay once you guys take out [00:16:27] that's all okay once you guys take out your laptop's search of this paper [00:16:29] your laptop's search of this paper download that usually refined JSON [00:16:31] download that usually refined JSON archive arxiv right and and this is also [00:16:36] archive arxiv right and and this is also sometimes also call this dense necks I [00:16:38] sometimes also call this dense necks I guess and go ahead and take once you [00:16:46] guess and go ahead and take once you take like seven minutes to read this [00:16:48] take like seven minutes to read this paper and I'll let you know when the [00:16:50] paper and I'll let you know when the time is passed and then after that time [00:16:53] time is passed and then after that time I'd like you to you know discuss with [00:16:56] I'd like you to you know discuss with your work with the others write what you [00:16:59] your work with the others write what you think are the answers especially the [00:17:01] think are the answers especially the first two because the other two you can [00:17:02] first two because the other two you can leave off but once you go ahead and take [00:17:05] leave off but once you go ahead and take a few minutes to do that now and then [00:17:06] a few minutes to do that now and then I'll let you know when sort of like [00:17:09] I'll let you know when sort of like seven minutes have passed and then you [00:17:11] seven minutes have passed and then you can discuss your answers of these with [00:17:12] can discuss your answers of these with your friends [00:17:14] your friends all right guys so anyone with any [00:17:20] all right guys so anyone with any thoughts or insights surprises or [00:17:24] thoughts or insights surprises or thoughts from this so now you spent [00:17:27] thoughts from this so now you spent eleven minutes on this paper right seven [00:17:29] eleven minutes on this paper right seven minutes reading four minutes discussing [00:17:30] minutes reading four minutes discussing we're just really really short period of [00:17:32] we're just really really short period of time but any any thoughts what do you [00:17:35] time but any any thoughts what do you think of the paper [00:17:41] yo-yo just spent a lot of time saying [00:17:43] yo-yo just spent a lot of time saying all stuff to each other what do people [00:17:47] all stuff to each other what do people think of the time you spend trying to [00:17:49] think of the time you spend trying to read the paper actually they should tell [00:17:59] read the paper actually they should tell you how I should raise your hand if you [00:18:01] you how I should raise your hand if you you know you kind of understood the main [00:18:02] you know you kind of understood the main concepts in and actually depressing of [00:18:11] concepts in and actually depressing of the figures [00:18:20] Wow people are really less than [00:18:22] Wow people are really less than energetic today unusual so I think this [00:18:29] energetic today unusual so I think this is one of those papers where um the the [00:18:32] is one of those papers where um the the paper is almost entirely summarized in [00:18:36] paper is almost entirely summarized in figures 1 & 2 I think of you a lot they [00:18:40] figures 1 & 2 I think of you a lot they would not be if you look at figure 1 in [00:18:43] would not be if you look at figure 1 in the caption there in Figure 2 on page 3 [00:18:46] the caption there in Figure 2 on page 3 and the caption there and understand [00:18:48] and the caption there and understand those two figures those really convey [00:18:49] those two figures those really convey you know 80% of the idea of the paper [00:18:52] you know 80% of the idea of the paper right and I think that a a couple of [00:18:59] right and I think that a a couple of other tips so it turns out that as you [00:19:01] other tips so it turns out that as you read these papers what practice you do [00:19:04] read these papers what practice you do get faster so for example table 1 on [00:19:09] get faster so for example table 1 on page 4 right you know this mess of the [00:19:12] page 4 right you know this mess of the table on top this is a pretty common [00:19:15] table on top this is a pretty common format or a format like this is how a [00:19:17] format or a format like this is how a lot of authors use to describe their [00:19:19] lot of authors use to describe their network architecture especially in [00:19:21] network architecture especially in computer vision so one of the things you [00:19:23] computer vision so one of the things you find as well is that the first time you [00:19:26] find as well is that the first time you see something like table 1 it just looks [00:19:27] see something like table 1 it just looks really complicated but by the time [00:19:29] really complicated but by the time you've read a few papers in the similar [00:19:31] you've read a few papers in the similar format you can look at able one and go [00:19:33] format you can look at able one and go oh yep got it you know this is this is [00:19:35] oh yep got it you know this is this is this is the dense net 121 for suggest [00:19:38] this is the dense net 121 for suggest and 169 architecture and be able to more [00:19:40] and 169 architecture and be able to more quickly pick up those things and so [00:19:42] quickly pick up those things and so another thing you'll find is that [00:19:43] another thing you'll find is that reading these papers actually gets [00:19:45] reading these papers actually gets better of practice because you see [00:19:47] better of practice because you see different authors use different ways or [00:19:49] different authors use different ways or similar ways of expressing themselves [00:19:50] similar ways of expressing themselves and it gets used to that you actually be [00:19:52] and it gets used to that you actually be faster and faster at implementing these [00:19:55] faster and faster at implementing these understandings ideas and I think I know [00:19:59] understandings ideas and I think I know these days we're not reading a paper [00:20:00] these days we're not reading a paper like this it maybe takes me about half [00:20:02] like this it maybe takes me about half an hour to took you like and I know I [00:20:04] an hour to took you like and I know I gave you guys seven minutes when I [00:20:05] gave you guys seven minutes when I thought I would need maybe half an hour [00:20:07] thought I would need maybe half an hour to figure out a paper like this and I [00:20:11] to figure out a paper like this and I think for a more and I find it it's not [00:20:14] think for a more and I find it it's not unusual for people relatively new to [00:20:16] unusual for people relatively new to machine learning to me maybe an hour to [00:20:19] machine learning to me maybe an hour to kind of you know really understand the [00:20:20] kind of you know really understand the paper like this and then although I'm [00:20:23] paper like this and then although I'm pretty experienced machine learning some [00:20:25] pretty experienced machine learning some down to maybe half an hour for people [00:20:26] down to maybe half an hour for people like this maybe even 20 minutes [00:20:28] like this maybe even 20 minutes it was a really easy one but there are [00:20:30] it was a really easy one but there are some outliers so I have some colleagues [00:20:32] some outliers so I have some colleagues sometimes stumble across a really [00:20:34] sometimes stumble across a really difficult paper you need to chase out a [00:20:36] difficult paper you need to chase out a lot of references and learn a lot of [00:20:37] lot of references and learn a lot of others now so sometimes you come across [00:20:39] others now so sometimes you come across a paper that takes you three or four [00:20:41] a paper that takes you three or four hours or even longer to really [00:20:43] hours or even longer to really understand it but but I think depending [00:20:46] understand it but but I think depending on how much time you want to spend [00:20:48] on how much time you want to spend probably reading papers you can actually [00:20:51] probably reading papers you can actually learn you know learn a lot rate doing [00:20:54] learn you know learn a lot rate doing what you just did but maybe spending [00:20:55] what you just did but maybe spending half an hour per paper in our paper [00:20:57] half an hour per paper in our paper rather than seven minutes right um so [00:21:01] rather than seven minutes right um so all right I feel like yeah that's great [00:21:06] all right I feel like yeah that's great and notice that I've actually not sent [00:21:09] and notice that I've actually not sent anything about the content of this paper [00:21:10] anything about the content of this paper right so whatever you guys just learned [00:21:12] right so whatever you guys just learned that was all you I had nothing to do [00:21:14] that was all you I had nothing to do with it so yeah like you're off there [00:21:16] with it so yeah like you're off there both you go and learn this stuff by [00:21:17] both you go and learn this stuff by yourself you don't need me anymore so [00:21:22] yourself you don't need me anymore so just the last few comments let's see so [00:21:28] just the last few comments let's see so the other few can you ask questions I [00:21:30] the other few can you ask questions I get is uh you know where do you go the [00:21:36] get is uh you know where do you go the deep learning field evolves so rapidly [00:21:37] deep learning field evolves so rapidly so where where do you go to so if you're [00:21:41] so where where do you go to so if you're trying to master a new body of knowledge [00:21:42] trying to master a new body of knowledge definitely do web searches and they're [00:21:44] definitely do web searches and they're often good blog holes on you know here [00:21:46] often good blog holes on you know here the most important papers and speech [00:21:48] the most important papers and speech recognition there are lots of great [00:21:49] recognition there are lots of great resources there and then the other thing [00:21:51] resources there and then the other thing you I don't know a lot of people try [00:21:53] you I don't know a lot of people try want to do is try to keep up with the [00:21:55] want to do is try to keep up with the state of the art of deep learning even [00:21:56] state of the art of deep learning even as is evolving rapidly and so I'll just [00:22:00] as is evolving rapidly and so I'll just tell you where I go to keep up with you [00:22:04] tell you where I go to keep up with you know discussions announcements [00:22:05] know discussions announcements surprisingly Twitter is becoming [00:22:07] surprisingly Twitter is becoming surprisingly important place for [00:22:09] surprisingly important place for researchers to find it about new things [00:22:12] researchers to find it about new things there's an ml subreddit is actually [00:22:16] there's an ml subreddit is actually pretty good a lot of noise but many [00:22:19] pretty good a lot of noise but many important pieces of work do get [00:22:21] important pieces of work do get mentioned there some of the top machine [00:22:24] mentioned there some of the top machine learning Kong conferences are nips ICML [00:22:28] learning Kong conferences are nips ICML and I clear all right and so whenever [00:22:31] and I clear all right and so whenever these conferences come around take a [00:22:33] these conferences come around take a look and glossary at least the title see [00:22:35] look and glossary at least the title see if there's something that interests you [00:22:36] if there's something that interests you and then I think I'm fortunate I guess [00:22:38] and then I think I'm fortunate I guess have friends you know both colleagues [00:22:42] have friends you know both colleagues here in Stanford as those colleagues are [00:22:44] here in Stanford as those colleagues are several the teams that work with that [00:22:47] several the teams that work with that that just tell me whether there's a cool [00:22:49] that just tell me whether there's a cool paper I guess but I think with here [00:22:51] paper I guess but I think with here within Stanford or among with your [00:22:53] within Stanford or among with your workplace for those of you taking this [00:22:54] workplace for those of you taking this at a CPD if you can form a community [00:22:56] at a CPD if you can form a community that shares interesting papers so all [00:22:59] that shares interesting papers so all the grooves I have are on slack and we [00:23:01] the grooves I have are on slack and we regularly slack each other or send send [00:23:03] regularly slack each other or send send each other text messages on the slack [00:23:05] each other text messages on the slack messaging system where you find [00:23:07] messaging system where you find interesting papers and that that's been [00:23:08] interesting papers and that that's been great for me actually yeah oh and and [00:23:12] great for me actually yeah oh and and and Twitter [00:23:12] and Twitter let's see ken is I fell looking at you [00:23:16] let's see ken is I fell looking at you could follow him to this is me engine [00:23:22] could follow him to this is me engine whining right I pray don't slack up [00:23:25] whining right I pray don't slack up papers as often as I do but if you look [00:23:27] papers as often as I do but if you look at I don't know you can also look at who [00:23:29] at I don't know you can also look at who we follow their love of could be [00:23:30] we follow their love of could be searches that then will share all these [00:23:33] searches that then will share all these things online oh and um there's a bunch [00:23:36] things online oh and um there's a bunch of people they also use a website called [00:23:38] of people they also use a website called archive sanity I don't ask much [00:23:40] archive sanity I don't ask much sometimes but lots of visuals is like [00:23:43] sometimes but lots of visuals is like that um cool so just two lost tips for [00:23:53] that um cool so just two lost tips for how to read papers in good good at this [00:23:59] so it's a more deeply understand a paper [00:24:02] so it's a more deeply understand a paper some of the papers will have math in it [00:24:06] some of the papers will have math in it and actually if you read the oh no y'all [00:24:09] and actually if you read the oh no y'all learn about fashion all right in the [00:24:11] learn about fashion all right in the second modules if you read The Bachelor [00:24:14] second modules if you read The Bachelor on paper is actually one of harder [00:24:16] on paper is actually one of harder papers you read there's a lot of math in [00:24:19] papers you read there's a lot of math in the derivation or vaginal but they're [00:24:21] the derivation or vaginal but they're papers like that and if you want to make [00:24:22] papers like that and if you want to make sure you understand in math here's what [00:24:25] sure you understand in math here's what I would recommend which is a read [00:24:27] I would recommend which is a read through it take detailed notes and then [00:24:30] through it take detailed notes and then see if you can read arrive in from [00:24:31] see if you can read arrive in from scratch so if you want to deeply [00:24:36] scratch so if you want to deeply understand the math of an algorithm like [00:24:37] understand the math of an algorithm like you know fashion or more the details of [00:24:40] you know fashion or more the details of back problems [00:24:41] back problems the good practice and I think a lot of [00:24:44] the good practice and I think a lot of them sort of a theory their own from the [00:24:48] them sort of a theory their own from the science and mathematics PhD says will [00:24:50] science and mathematics PhD says will use a practice like this [00:24:51] use a practice like this we're just go ahead and read the paper [00:24:52] we're just go ahead and read the paper make sure you understand it and then to [00:24:54] make sure you understand it and then to make sure you really really understand [00:24:56] make sure you really really understand it put a put aside the result and try to [00:25:00] it put a put aside the result and try to read arrive the math yourself from [00:25:02] read arrive the math yourself from scratch and you can start from a blank [00:25:03] scratch and you can start from a blank piece of paper and read arrive one of [00:25:05] piece of paper and read arrive one of these algorithms from scratch then [00:25:07] these algorithms from scratch then that's a good sign that you really [00:25:08] that's a good sign that you really understand it [00:25:09] understand it when I was a PhD student I did this a [00:25:11] when I was a PhD student I did this a lot right that you know I wouldn't be [00:25:13] lot right that you know I wouldn't be the text book or read the paper or [00:25:15] the text book or read the paper or something and then put aside whether I [00:25:17] something and then put aside whether I read and see if I could read arrived it [00:25:18] read and see if I could read arrived it from scratch starting from a blank piece [00:25:20] from scratch starting from a blank piece of paper as only if I could do that that [00:25:22] of paper as only if I could do that that I would you know feel like yep I think I [00:25:24] I would you know feel like yep I think I understand this piece of math and it [00:25:26] understand this piece of math and it turns out if you want me to do this type [00:25:27] turns out if you want me to do this type of map yourself is your ability to [00:25:30] of map yourself is your ability to derive this type of map we divide the [00:25:32] derive this type of map we divide the size of math that gives you the ability [00:25:34] size of math that gives you the ability to generalize to derive new novel pieces [00:25:37] to generalize to derive new novel pieces of map yourself so I think I actually [00:25:39] of map yourself so I think I actually learned all the math for several machine [00:25:41] learned all the math for several machine learning by doing this and this by read [00:25:43] learning by doing this and this by read arriving other people's work that [00:25:44] arriving other people's work that allowed me to learn how to divide my own [00:25:46] allowed me to learn how to divide my own novel algorithms and actually sometimes [00:25:49] novel algorithms and actually sometimes you go to the art galleries right let go [00:25:52] you go to the art galleries right let go to Smithsonian [00:25:53] to Smithsonian you see these aren't students you know [00:25:55] you see these aren't students you know sitting on the floor copying the great [00:25:58] sitting on the floor copying the great artworks to create paintings you know [00:26:00] artworks to create paintings you know painted by the masses centuries ago and [00:26:02] painted by the masses centuries ago and so I think just as today there are [00:26:05] so I think just as today there are students sitting in or the DeYoung [00:26:07] students sitting in or the DeYoung museum or whatever every war and I was [00:26:09] museum or whatever every war and I was at a Getty Museum in LA a few months ago [00:26:12] at a Getty Museum in LA a few months ago you actually see these art students you [00:26:14] you actually see these art students you know copying the work of the Masters and [00:26:16] know copying the work of the Masters and I think a lot of the ways that you want [00:26:18] I think a lot of the ways that you want to become good at the math of machine [00:26:20] to become good at the math of machine learning yourself this is this is a good [00:26:22] learning yourself this is this is a good way to do it it's time-consuming but [00:26:24] way to do it it's time-consuming but then you can become good at it anyway [00:26:25] then you can become good at it anyway and the same thing for codes right I [00:26:28] and the same thing for codes right I think the simple you know lightweight [00:26:32] think the simple you know lightweight version one of learning would be to [00:26:34] version one of learning would be to download and run the open-source code if [00:26:38] download and run the open-source code if you can find it and the deeper way to [00:26:40] you can find it and the deeper way to learn this material is the reimplemented [00:26:42] learn this material is the reimplemented from scratch it's easy to download open [00:26:52] from scratch it's easy to download open sourcing and rather [00:26:53] sourcing and rather it works but if you can reemployment one [00:26:55] it works but if you can reemployment one of these algorithms from scratch then [00:26:58] of these algorithms from scratch then that's a strong sign that you really [00:26:59] that's a strong sign that you really understood this our problem okay all [00:27:06] understood this our problem okay all right and then longer-term advice um you [00:27:25] right and then longer-term advice um you know for you to keep on learning and [00:27:26] know for you to keep on learning and keep on getting better and better the [00:27:28] keep on getting better and better the more important thing is for you to learn [00:27:30] more important thing is for you to learn steadily not for you to have a focus [00:27:32] steadily not for you to have a focus intense activity you know like all of [00:27:35] intense activity you know like all of Thanksgiving you read 50 papers over [00:27:37] Thanksgiving you read 50 papers over Thanksgiving and then you're done for [00:27:39] Thanksgiving and then you're done for the rest of your life and it doesn't [00:27:40] the rest of your life and it doesn't work like that right and I think you're [00:27:42] work like that right and I think you're actually much better off reading to a [00:27:43] actually much better off reading to a few papers a week for the next year then [00:27:46] few papers a week for the next year then you know cramming everything right over [00:27:49] you know cramming everything right over over one long weekend or something [00:27:50] over one long weekend or something actually an education where she know [00:27:52] actually an education where she know that spaced repetition works better than [00:27:54] that spaced repetition works better than cramming so the same same thing same [00:27:57] cramming so the same same thing same body of learning if you learn a bit [00:27:58] body of learning if you learn a bit every week in space without you actually [00:28:00] every week in space without you actually have much better long-term attention [00:28:02] have much better long-term attention then you try to cram it in a short term [00:28:04] then you try to cram it in a short term so this is a very solid result that we [00:28:07] so this is a very solid result that we know from pedagogy and how the human [00:28:09] know from pedagogy and how the human brain works oh so if you're able to [00:28:11] brain works oh so if you're able to serve again the way I my life is my [00:28:15] serve again the way I my life is my backpack I just always have a few papers [00:28:16] backpack I just always have a few papers with me I'm gonna find that I can I read [00:28:19] with me I'm gonna find that I can I read almost everything on the tablet on my [00:28:22] almost everything on the tablet on my iPad but I find that research papers and [00:28:24] iPad but I find that research papers and all those things where the ability to [00:28:25] all those things where the ability to flip between pages and skim I still find [00:28:28] flip between pages and skim I still find more efficient on paper so I read almost [00:28:30] more efficient on paper so I read almost nothing on paper these days except for [00:28:33] nothing on paper these days except for research papers but that's just me your [00:28:34] research papers but that's just me your mileage may vary maybe something else [00:28:36] mileage may vary maybe something else will work out for you okay all right um [00:28:40] will work out for you okay all right um so let's see that's it for reading mr. [00:28:42] so let's see that's it for reading mr. rivers I hope that while you're in CSU [00:28:44] rivers I hope that while you're in CSU 30 you know but if some of you find some [00:28:46] 30 you know but if some of you find some cool papers of you go further for the [00:28:49] cool papers of you go further for the dense net paper and find an interesting [00:28:50] dense net paper and find an interesting result there go ahead and post on Piazza [00:28:53] result there go ahead and post on Piazza or if any of you want to saw the reading [00:28:55] or if any of you want to saw the reading group of other friends here at Stanford [00:28:57] group of other friends here at Stanford you know encourage you to look around [00:28:59] you know encourage you to look around class fine fine fine a group here on [00:29:02] class fine fine fine a group here on campus or with among your CSU 30 calls [00:29:05] campus or with among your CSU 30 calls Basel [00:29:06] Basel all of your work colleagues for those of [00:29:09] all of your work colleagues for those of you taking this on SCPD so that you can [00:29:11] you taking this on SCPD so that you can all keep you know studying so nutrition [00:29:13] all keep you know studying so nutrition and learning them and then helping each [00:29:15] and learning them and then helping each other alone okay so that's it for [00:29:19] other alone okay so that's it for reading papers the second thing I want [00:29:21] reading papers the second thing I want to do today is just give some [00:29:22] to do today is just give some longer-term advice on navigating a [00:29:24] longer-term advice on navigating a career in machine learning right any [00:29:27] career in machine learning right any questions about this before I move on [00:29:33] all right but I hope that was useful [00:29:36] all right but I hope that was useful some of this I wish I had know when I [00:29:38] some of this I wish I had know when I was a first year PhD student but oh all [00:29:41] was a first year PhD student but oh all right um let's see can we turn on the [00:29:45] right um let's see can we turn on the lights please [00:29:46] lights please oh all right so kind of in response to [00:29:52] oh all right so kind of in response to requests from earlier worse citizen [00:29:55] requests from earlier worse citizen earlier versions of class before we you [00:29:57] earlier versions of class before we you know as we approach the end of the [00:29:58] know as we approach the end of the quarter want to give some advice to [00:30:00] quarter want to give some advice to housing navigator career machine [00:30:02] housing navigator career machine learning right so today machine learning [00:30:04] learning right so today machine learning there are so many opportunities to do so [00:30:06] there are so many opportunities to do so many exciting things so how do you you [00:30:09] many exciting things so how do you you know what what do you want to do so I'm [00:30:13] know what what do you want to do so I'm going to assume that most of you will [00:30:17] going to assume that most of you will want to do one of two things right at [00:30:20] want to do one of two things right at some point you know you want to get the [00:30:25] some point you know you want to get the job right maybe a job that does work in [00:30:28] job right maybe a job that does work in machine learning and including a faculty [00:30:30] machine learning and including a faculty position for those who have been [00:30:31] position for those who have been professor but I guess eventually most [00:30:33] professor but I guess eventually most people end up with a job I think I guess [00:30:35] people end up with a job I think I guess there other alternatives but oh but then [00:30:38] there other alternatives but oh but then some of you want to go on to more [00:30:39] some of you want to go on to more advanced graduate studies and although [00:30:41] advanced graduate studies and although even after you get your PhD at some [00:30:43] even after you get your PhD at some point most people do get a job after the [00:30:45] point most people do get a job after the PhD and and and by job I mean either in [00:30:49] PhD and and and by job I mean either in the big company you know Ora's or a [00:30:53] the big company you know Ora's or a start-up right but regardless of the [00:30:57] start-up right but regardless of the details of this I think I hope most of [00:31:00] details of this I think I hope most of you want to do important work [00:31:07] so um what I'd like to do today is break [00:31:13] so um what I'd like to do today is break you know this into how do you find a job [00:31:16] you know this into how do you find a job or join a ph.d program or whatever then [00:31:19] or join a ph.d program or whatever then lets you do important work and I want to [00:31:21] lets you do important work and I want to break this discussion into two steps um [00:31:23] break this discussion into two steps um one is just you know how do you get a [00:31:25] one is just you know how do you get a position right how do you get that job [00:31:31] position right how do you get that job offer or how do you get that offer of [00:31:34] offer or how do you get that offer of Admissions in the ph.d program or [00:31:35] Admissions in the ph.d program or admission to the master's program well [00:31:37] admission to the master's program well whatever you want to do and then to is a [00:31:39] whatever you want to do and then to is a selecting a position right there between [00:31:45] selecting a position right there between you know going to this university versus [00:31:47] you know going to this university versus that university or between taking on the [00:31:49] that university or between taking on the job and this company is that company [00:31:50] job and this company is that company what are the ones that will tend to set [00:31:53] what are the ones that will tend to set you up for success for long term [00:31:54] you up for success for long term personal success and career success and [00:31:57] personal success and career success and everything I hope that by the way I hope [00:31:59] everything I hope that by the way I hope that all these are just tactics to let [00:32:01] that all these are just tactics to let you do important work right I know this [00:32:03] you do important work right I know this I hope that's what you want to do um so [00:32:06] I hope that's what you want to do um so you know what the recruiters look for oh [00:32:12] and I think just to keep the language [00:32:14] and I think just to keep the language simpler I'm going to pretend that I'm [00:32:17] simpler I'm going to pretend that I'm just gonna talk about finding a job and [00:32:19] just gonna talk about finding a job and but a lot of the very simple things [00:32:20] but a lot of the very simple things apply for PhD programs it's just instead [00:32:22] apply for PhD programs it's just instead of saying recruiters I would say [00:32:24] of saying recruiters I would say admissions committees right things [00:32:25] admissions committees right things actually [00:32:25] actually some of this is let me just focus on the [00:32:28] some of this is let me just focus on the job scenario so most recruiters look for [00:32:33] job scenario so most recruiters look for technical skills so for example there [00:32:36] technical skills so for example there are a lot of machine learning interviews [00:32:39] are a lot of machine learning interviews they'll ask you questions like you know [00:32:41] they'll ask you questions like you know well would you use gradient descent or [00:32:43] well would you use gradient descent or battery and descend the customer batches [00:32:45] battery and descend the customer batches and what happens in the mean batch sizes [00:32:46] and what happens in the mean batch sizes too large to small right so there are [00:32:48] too large to small right so there are companies many companies today asking [00:32:50] companies many companies today asking questions like that in the interview [00:32:52] questions like that in the interview process or can you explain everything [00:32:55] process or can you explain everything else here and that GRU and when would [00:32:57] else here and that GRU and when would you use GRU and so you really get [00:32:58] you use GRU and so you really get questions like that in many job [00:33:00] questions like that in many job interviews today [00:33:01] interviews today and so crus is looking for m/l skills as [00:33:05] and so crus is looking for m/l skills as well as and so you often be quiz on ml [00:33:10] well as and so you often be quiz on ml skills as well as your coding ability [00:33:14] skills as well as your coding ability and then beyond your and I think Silicon [00:33:18] and then beyond your and I think Silicon Valley has become quite good at giving [00:33:19] Valley has become quite good at giving people the assessments [00:33:21] people the assessments to test for real skill in machine [00:33:22] to test for real skill in machine learning engineering and the software [00:33:24] learning engineering and the software engineering um and then the other thing [00:33:27] engineering um and then the other thing that recruiters will look for that many [00:33:29] that recruiters will look for that many recruiters will look for is a meaningful [00:33:31] recruiters will look for is a meaningful work and in particular you know there [00:33:43] work and in particular you know there are some candidates that apply for jobs [00:33:45] are some candidates that apply for jobs that have very young theoretical they're [00:33:49] that have very young theoretical they're very academic skills meaning you can [00:33:51] very academic skills meaning you can answer all the quiz questions about you [00:33:53] answer all the quiz questions about you know what is Bosch knowing okay they're [00:33:54] know what is Bosch knowing okay they're arguing design for this but unless [00:33:56] arguing design for this but unless you've actually shown that you can apply [00:33:58] you've actually shown that you can apply this to a meaningful setting it's harder [00:34:01] this to a meaningful setting it's harder to convince a company or recruiter that [00:34:03] to convince a company or recruiter that you know not just a theory but that you [00:34:05] you know not just a theory but that you know how to actually make this stuff [00:34:07] know how to actually make this stuff work and so having done meaningful work [00:34:09] work and so having done meaningful work using machine learning is a very strong [00:34:12] using machine learning is a very strong as a very desirable candidate I think to [00:34:15] as a very desirable candidate I think to a lot of companies kind of work [00:34:16] a lot of companies kind of work experience I think really whether you've [00:34:19] experience I think really whether you've done whether you've done something [00:34:20] done whether you've done something meaningful reassures that you know that [00:34:23] meaningful reassures that you know that you can actually do work right it's not [00:34:27] you can actually do work right it's not just economic quiz questions being [00:34:28] just economic quiz questions being another implementer in the algorithms [00:34:30] another implementer in the algorithms that work and and and maybe um yeah and [00:34:38] that work and and and maybe um yeah and then many recruiters actually look for [00:34:39] then many recruiters actually look for your policy to keep on learning new [00:34:40] your policy to keep on learning new skills and stay on top of machine [00:34:42] skills and stay on top of machine learning and se evolves as well and so a [00:34:46] learning and se evolves as well and so a very common pattern for the successful [00:34:50] very common pattern for the successful you know AI engineers a machine learning [00:34:53] you know AI engineers a machine learning engineers would be the following where [00:34:55] engineers would be the following where if you on the horizontal axis I plot [00:34:58] if you on the horizontal axis I plot different areas so you might learned [00:35:01] different areas so you might learned about machine learning learn about deep [00:35:03] about machine learning learn about deep learning learn about probabilistic [00:35:05] learning learn about probabilistic graphical models learn about NLP learn [00:35:08] graphical models learn about NLP learn about computer vision and so on for [00:35:11] about computer vision and so on for other areas of AI or machine learning [00:35:12] other areas of AI or machine learning and at the vertical area if the vertical [00:35:16] and at the vertical area if the vertical axis is death a lot of strongest [00:35:21] axis is death a lot of strongest candidates or jobs are t-shaped [00:35:23] candidates or jobs are t-shaped individuals meaning that you have a [00:35:25] individuals meaning that you have a broad understanding of many different [00:35:26] broad understanding of many different topics in AI machine [00:35:28] topics in AI machine and very deep understanding in you know [00:35:31] and very deep understanding in you know maybe at least one area maybe more than [00:35:33] maybe at least one area maybe more than one area and so I think by taking sis to [00:35:37] one area and so I think by taking sis to 30 and doing the things that are doing [00:35:39] 30 and doing the things that are doing here hopefully you're harming a deeper [00:35:41] here hopefully you're harming a deeper understanding of one of these areas of [00:35:43] understanding of one of these areas of deep learning in particular but the [00:35:46] deep learning in particular but the other thing that even you know deepens [00:35:49] other thing that even you know deepens your knowledge in one area will be the [00:35:51] your knowledge in one area will be the projects you work on the open source [00:35:54] projects you work on the open source contributions you make right whether or [00:35:58] contributions you make right whether or not you've done research and maybe [00:36:01] not you've done research and maybe whether or not you've done an internship [00:36:04] whether or not you've done an internship and I think these two elements you know [00:36:08] and I think these two elements you know broad area of skills and then also going [00:36:11] broad area of skills and then also going deeper to do a meaningful project and [00:36:13] deeper to do a meaningful project and deep learning or work of a Stanford [00:36:15] deep learning or work of a Stanford professor right and do a meaningful [00:36:18] professor right and do a meaningful research projects or make some [00:36:20] research projects or make some contributions the open source publishing [00:36:21] contributions the open source publishing on github and then let's use it these [00:36:23] on github and then let's use it these are the things that let you deepen your [00:36:25] are the things that let you deepen your knowledge and convince recruiters that [00:36:27] knowledge and convince recruiters that you both have the broad technical skills [00:36:28] you both have the broad technical skills and when called on you're able to apply [00:36:31] and when called on you're able to apply these in a meaningful way to an [00:36:34] these in a meaningful way to an important problem right and in fact them [00:36:37] important problem right and in fact them the way we design CS 230 is actually a [00:36:39] the way we design CS 230 is actually a microcosm of this where you know you [00:36:42] microcosm of this where you know you learned about neural nets learned about [00:36:45] learned about neural nets learned about tapas at bash norm confidence sequence [00:36:49] tapas at bash norm confidence sequence models write them to say your Arlen's so [00:36:54] models write them to say your Arlen's so actually if a breath within the field of [00:36:58] actually if a breath within the field of deep learning and then what happens is [00:37:01] deep learning and then what happens is know then and the reason want you to [00:37:02] know then and the reason want you to work on the project is so that you can [00:37:04] work on the project is so that you can pick one of these areas and maybe go [00:37:06] pick one of these areas and maybe go deep and build a meaningful project in [00:37:10] deep and build a meaningful project in one of these areas which will much more [00:37:13] one of these areas which will much more and it's not just about making a resume [00:37:15] and it's not just about making a resume look good right it's about giving you [00:37:16] look good right it's about giving you the practical experience to make sure [00:37:18] the practical experience to make sure you actually know how to make these [00:37:19] you actually know how to make these things work and give you the learnings [00:37:22] things work and give you the learnings to make sure that you actually know how [00:37:24] to make sure that you actually know how to make a CNN work on our network and [00:37:26] to make a CNN work on our network and then constants the many students also [00:37:28] then constants the many students also listed practice on the resumes obviously [00:37:31] listed practice on the resumes obviously so I think the [00:37:38] let's see the failure modes the things [00:37:42] let's see the failure modes the things bad ways to navigate your career um [00:37:45] bad ways to navigate your career um there are some students that just do [00:37:47] there are some students that just do this right there are some time for [00:37:49] this right there are some time for students that just take class off the [00:37:51] students that just take class off the class off the cars off the cross and go [00:37:53] class off the cars off the cross and go equally in depth in a huge range of [00:37:56] equally in depth in a huge range of areas and this is not terrible you can [00:37:58] areas and this is not terrible you can actually still get a job [00:37:59] actually still get a job you still get sometimes you can even get [00:38:02] you still get sometimes you can even get into some PhD programs like this without [00:38:04] into some PhD programs like this without the Deaf but this is not the best way to [00:38:06] the Deaf but this is not the best way to navigate your career right so there are [00:38:08] navigate your career right so there are some Stanford's issues that take tons of [00:38:10] some Stanford's issues that take tons of classes even get a good GPA doing that [00:38:12] classes even get a good GPA doing that but do nothing else and this is not [00:38:15] but do nothing else and this is not terrible but this is this is not this is [00:38:17] terrible but this is this is not this is not great it's not as good as the [00:38:19] not great it's not as good as the alternative um there's one other thing [00:38:22] alternative um there's one other thing I've seen Stanford students do which is [00:38:24] I've seen Stanford students do which is just try to do that [00:38:26] just try to do that right but she's just try to jump in on [00:38:29] right but she's just try to jump in on day one and go really really deep in one [00:38:32] day one and go really really deep in one area and again this has its own [00:38:36] area and again this has its own challenges [00:38:36] challenges I guess you know one one one one failure [00:38:39] I guess you know one one one one failure mode 1 mode there's actually not great [00:38:41] mode 1 mode there's actually not great is sometimes where you get some [00:38:43] is sometimes where you get some undergrad freshman at Stanford that have [00:38:46] undergrad freshman at Stanford that have not yet learned a lot about coding or [00:38:48] not yet learned a lot about coding or software engineering or machine learning [00:38:49] software engineering or machine learning and try to jump into a research project [00:38:51] and try to jump into a research project right away this turns out not be very [00:38:53] right away this turns out not be very efficient because it turns out Stanford [00:38:54] efficient because it turns out Stanford courses are you know online courses the [00:38:56] courses are you know online courses the Stanford classes they're very efficient [00:38:58] Stanford classes they're very efficient way for you to learn about the program [00:38:59] way for you to learn about the program chip areas and after that going deeper [00:39:02] chip areas and after that going deeper and getting experience in one vertical [00:39:03] and getting experience in one vertical area then defense is knowledge make sure [00:39:05] area then defense is knowledge make sure you know how to actually make those [00:39:06] you know how to actually make those ideas work so I do see sometimes [00:39:08] ideas work so I do see sometimes unfortunately you know send some time [00:39:11] unfortunately you know send some time for freshmen join us already know how [00:39:12] for freshmen join us already know how the code and have implemented you know [00:39:14] the code and have implemented you know some learning Avram's but some students [00:39:17] some learning Avram's but some students that do not yet have much experience try [00:39:20] that do not yet have much experience try to jump in the research project right [00:39:21] to jump in the research project right away and that turns out not to be very [00:39:23] away and that turns out not to be very productive for the student or for the [00:39:25] productive for the student or for the research group because until you've [00:39:27] research group because until you've taken classes and master basics is [00:39:28] taken classes and master basics is difficult to understand what's really [00:39:30] difficult to understand what's really going on in the advanced projects right [00:39:33] going on in the advanced projects right so I would I would say this is actually [00:39:35] so I would I would say this is actually worse than that right this is this is [00:39:38] worse than that right this is this is actually okay this is actually pretty [00:39:40] actually okay this is actually pretty bad this is III would not do this for [00:39:42] bad this is III would not do this for your career [00:39:46] and then the other not so grateful that [00:39:50] and then the other not so grateful that you see some sandwiches to do is get a [00:39:54] you see some sandwiches to do is get a lot of breath and then do a tiny project [00:39:57] lot of breath and then do a tiny project here and do a tiny project in there I do [00:39:58] here and do a tiny project in there I do a tiny project there do a tiny project [00:40:00] a tiny project there do a tiny project there and you end up with ten tiny [00:40:02] there and you end up with ten tiny projects but know one or two really [00:40:05] projects but know one or two really significant projects and again this is [00:40:08] significant projects and again this is not terrible but you know beyond a [00:40:11] not terrible but you know beyond a certain point by the way [00:40:13] certain point by the way recruiters are not impressed by volume [00:40:16] recruiters are not impressed by volume right so having done ten lane projects [00:40:18] right so having done ten lane projects is actually not impressive not nearly as [00:40:20] is actually not impressive not nearly as impressive as doing one great project or [00:40:23] impressive as doing one great project or two great projects and again there's [00:40:25] two great projects and again there's more to life than impressing recruiters [00:40:27] more to life than impressing recruiters but recruits is very rational and the [00:40:29] but recruits is very rational and the reason recruiters are less impressed by [00:40:30] reason recruiters are less impressed by someone whose profile looks like this is [00:40:32] someone whose profile looks like this is because they're actually probably [00:40:34] because they're actually probably factually less skilled and less able and [00:40:36] factually less skilled and less able and doing good work and machine learning [00:40:38] doing good work and machine learning compared to someone that that has done a [00:40:40] compared to someone that that has done a substantive project and knows what it [00:40:42] substantive project and knows what it takes to see see the whole thing through [00:40:44] takes to see see the whole thing through that make sense so when I say you know [00:40:46] that make sense so when I say you know recruit is more or less empresas because [00:40:48] recruit is more or less empresas because they're actually quite rational in terms [00:40:50] they're actually quite rational in terms of trying to understand how good you are [00:40:52] of trying to understand how good you are at that doing important work or building [00:40:56] at that doing important work or building meaningful [00:40:57] meaningful AR systems and so in terms of building [00:41:02] AR systems and so in terms of building up of the horizontal piece and vertical [00:41:03] up of the horizontal piece and vertical piece this is what I recommend to build [00:41:07] piece this is what I recommend to build the horizontal piece lot of this is [00:41:09] the horizontal piece lot of this is about building foundational skills and [00:41:16] about building foundational skills and it turns out coursework is a very [00:41:19] it turns out coursework is a very efficient way to do this you know in [00:41:22] efficient way to do this you know in these courses right you know various [00:41:24] these courses right you know various instructors like us but many other [00:41:27] instructors like us but many other Stanford professors for the lot of work [00:41:29] Stanford professors for the lot of work and to organizing the content to make it [00:41:30] and to organizing the content to make it efficient for you to learn this material [00:41:33] efficient for you to learn this material and then also reading research papers [00:41:38] and then also reading research papers which we just talked about having a [00:41:40] which we just talked about having a community will help you and then that is [00:41:44] community will help you and then that is often building a more deep [00:41:54] and relevant project and and and if the [00:41:58] and relevant project and and and if the product projects had to be relevant so [00:41:59] product projects had to be relevant so you know if you want to build a career [00:42:01] you know if you want to build a career machine learning ability or in the eye [00:42:03] machine learning ability or in the eye hopefully the project is something [00:42:04] hopefully the project is something that's relevant to CSO machine learning [00:42:06] that's relevant to CSO machine learning or AR deep learning [00:42:07] or AR deep learning I do see I don't know for some reason I [00:42:10] I do see I don't know for some reason I feel like a surprisingly large number of [00:42:13] feel like a surprisingly large number of stem sins I know I understand that dance [00:42:15] stem sins I know I understand that dance true and they spent a long time on that [00:42:17] true and they spent a long time on that which is fine if you enjoy dancing go [00:42:19] which is fine if you enjoy dancing go have fun [00:42:20] have fun don't don't you know you don't need to [00:42:22] don't don't you know you don't need to work all the time so going join the [00:42:23] work all the time so going join the dance crew or go on the overseas [00:42:26] dance crew or go on the overseas exchange program and go hang out in [00:42:27] exchange program and go hang out in London and have fun but those things do [00:42:30] London and have fun but those things do not ask directly contribute to this [00:42:32] not ask directly contribute to this right yeah I know I think I think in an [00:42:38] right yeah I know I think I think in an earlier version in this presentation you [00:42:40] earlier version in this presentation you know students walked away saying huh you [00:42:42] know students walked away saying huh you know Andrew says we should not have fun [00:42:44] know Andrew says we should not have fun and work all the time and that's not to [00:42:46] and work all the time and that's not to go [00:42:53] um all right there is one all right um [00:43:15] you know there is the Saturday morning [00:43:22] you know there is the Saturday morning problem which all of you will face right [00:43:26] problem which all of you will face right which is every week including this week [00:43:29] which is every week including this week on Saturday morning you have a choice [00:43:31] on Saturday morning you have a choice you can read the paper or work on [00:43:44] you can read the paper or work on research or work on open source or I [00:43:50] research or work on open source or I don't know what people do or you can [00:43:51] don't know what people do or you can watch TV or something um and you will [00:43:56] watch TV or something um and you will face this choice but maybe every [00:43:58] face this choice but maybe every Saturday you know for the rest of your [00:43:59] Saturday you know for the rest of your life or for Law Saturdays in the rest of [00:44:01] life or for Law Saturdays in the rest of your life and um you know you can build [00:44:04] your life and um you know you can build out that foundational skills go deep or [00:44:07] out that foundational skills go deep or go have fun and you should have fun all [00:44:09] go have fun and you should have fun all right it's for the record but one of the [00:44:12] right it's for the record but one of the problems that a lot of people face is [00:44:14] problems that a lot of people face is that even if you spend all Saturday and [00:44:17] that even if you spend all Saturday and all Sunday reading a research paper you [00:44:20] all Sunday reading a research paper you know the following Monday or maybe spent [00:44:23] know the following Monday or maybe spent all Saturday and Sunday working hard it [00:44:25] all Saturday and Sunday working hard it turns out that the following Monday [00:44:27] turns out that the following Monday you're not that much better at deep [00:44:29] you're not that much better at deep learning is that yeah you work really [00:44:30] learning is that yeah you work really hard so you read five papers you know [00:44:32] hard so you read five papers you know great but if you work of a research [00:44:35] great but if you work of a research group the professor or you you know or [00:44:37] group the professor or you you know or your manager if you're in the company [00:44:38] your manager if you're in the company they have no idea how hard you work so [00:44:41] they have no idea how hard you work so there's no one to come by and say oh [00:44:42] there's no one to come by and say oh good job working so how long we can so [00:44:44] good job working so how long we can so no one knows these sacrifices you may [00:44:47] no one knows these sacrifices you may all weekend to study your code open [00:44:49] all weekend to study your code open Sol's opposes this no one knows so [00:44:51] Sol's opposes this no one knows so there's almost no short-term reward to [00:44:53] there's almost no short-term reward to doing these things but the seat but and [00:44:57] doing these things but the seat but and and where whereas they might be short [00:44:59] and where whereas they might be short term rewards to doing other things right [00:45:01] term rewards to doing other things right but the secret to this is [00:45:04] but the secret to this is that is not about meeting papers really [00:45:07] that is not about meeting papers really really hard for one Saturday morning or [00:45:09] really hard for one Saturday morning or for all Saturday once and then being [00:45:11] for all Saturday once and then being done the secret to this is to do this [00:45:14] done the secret to this is to do this consistently you know for years or at [00:45:17] consistently you know for years or at least four months and it turns out that [00:45:18] least four months and it turns out that if you read um two papers a week and you [00:45:22] if you read um two papers a week and you do that for a year then you have read 50 [00:45:25] do that for a year then you have read 50 papers after a year and you will be much [00:45:27] papers after a year and you will be much better at delivering after that right I [00:45:29] better at delivering after that right I mean when you really you're aware the [00:45:32] mean when you really you're aware the hundred papers in the year will be two [00:45:33] hundred papers in the year will be two papers a week and and so is so for you [00:45:37] papers a week and and so is so for you to be successful it's much less about [00:45:38] to be successful it's much less about the intense burst of effort you put in [00:45:41] the intense burst of effort you put in over one weekend it's much more about [00:45:43] over one weekend it's much more about whether you can find a little bit of [00:45:45] whether you can find a little bit of time every week to read a few papers or [00:45:47] time every week to read a few papers or contribute to open source or take some [00:45:49] contribute to open source or take some online courses but and if you do that [00:45:52] online courses but and if you do that you know every week for six months or do [00:45:54] you know every week for six months or do that every week for a year you will [00:45:56] that every week for a year you will actually learn a lot about these fields [00:45:58] actually learn a lot about these fields and be much better off and be much more [00:46:00] and be much better off and be much more capable at deep learning or machine [00:46:02] capable at deep learning or machine learning or whatever so yeah nobody yeah [00:46:09] learning or whatever so yeah nobody yeah actually my wife and I actually do not [00:46:12] actually my wife and I actually do not own the TV for what I saw right but [00:46:14] own the TV for what I saw right but again if you if you oh my go ahead this [00:46:16] again if you if you oh my go ahead this is a make sure you don't don't don't [00:46:19] is a make sure you don't don't don't don't drive yourself crazy and and the [00:46:21] don't drive yourself crazy and and the healthy work-life integration as well [00:46:27] all right so um so I hope that doing [00:46:33] all right so um so I hope that doing these things [00:46:34] these things whoa it's not about finding a job is [00:46:36] whoa it's not about finding a job is about doing these things to make you [00:46:38] about doing these things to make you more capable as a machine learning [00:46:39] more capable as a machine learning person so that you have the power to God [00:46:42] person so that you have the power to God and implement stuff that matters and to [00:46:44] and implement stuff that matters and to do stuff the - you do do do work the [00:46:46] do stuff the - you do do do work the matters well the second thing we like [00:46:49] matters well the second thing we like chat about is selecting a job and [00:46:52] chat about is selecting a job and there's actually interesting um I gave [00:46:56] there's actually interesting um I gave this public presentation last year [00:46:59] this public presentation last year sorry earlier this year and shortly [00:47:02] sorry earlier this year and shortly after that presentation there was a [00:47:05] after that presentation there was a student in the class there was already [00:47:06] student in the class there was already in a company who emailed me saying boy [00:47:09] in a company who emailed me saying boy Andrew I wish you had told me this [00:47:10] Andrew I wish you had told me this before I said to my current job [00:47:12] before I said to my current job so let's see let's see let's see [00:47:16] so let's see let's see let's see hopefully this is be useful to you um so [00:47:19] hopefully this is be useful to you um so it turns out that you know I so when [00:47:26] it turns out that you know I so when you're at some point you're on you be [00:47:27] you're at some point you're on you be deciding you know what peeps apparently [00:47:28] deciding you know what peeps apparently wanna apply for what companies do want [00:47:30] wanna apply for what companies do want higher job ads and I can tell you what [00:47:38] so if you want to keep learning new [00:47:40] so if you want to keep learning new things I think one of the biggest [00:47:43] things I think one of the biggest predictors of your success will be [00:47:45] predictors of your success will be whether or not you're working with great [00:47:47] whether or not you're working with great people and projects right and in [00:47:56] people and projects right and in particular you know there are these [00:47:58] particular you know there are these fascinating results from whether I think [00:48:01] fascinating results from whether I think I want to say from the social sciences [00:48:02] I want to say from the social sciences showing that if your closest friends if [00:48:06] showing that if your closest friends if your five closest friends retain closer [00:48:08] your five closest friends retain closer friends are all smokers there's a much [00:48:10] friends are all smokers there's a much higher chance you become a smoker as [00:48:11] higher chance you become a smoker as well right and if you're five or ten [00:48:13] well right and if you're five or ten close friends are you know overweight [00:48:16] close friends are you know overweight there's much higher chance you do the [00:48:18] there's much higher chance you do the same or and conversely there's a you [00:48:21] same or and conversely there's a you know so I think that if your five [00:48:23] know so I think that if your five closest friends work really hard really [00:48:25] closest friends work really hard really long research papers care about the work [00:48:27] long research papers care about the work right learning and making themselves [00:48:29] right learning and making themselves better then there's actually very good [00:48:30] better then there's actually very good chance that you will be that they'll [00:48:32] chance that you will be that they'll influence you that way as well so we're [00:48:34] influence you that way as well so we're all human we all influenced by the [00:48:36] all human we all influenced by the people around us right and so um I think [00:48:40] people around us right and so um I think that and I've been fortunate I've told [00:48:42] that and I've been fortunate I've told the Stanford for a long time now is I've [00:48:44] the Stanford for a long time now is I've been fortunate to have seen a lot of [00:48:46] been fortunate to have seen a lot of students from go from Stanford to [00:48:48] students from go from Stanford to various careers and because I've seen [00:48:50] various careers and because I've seen how many hundreds or maybe low thousands [00:48:53] how many hundreds or maybe low thousands understand the students that I knew [00:48:54] understand the students that I knew right when there are so stem forces go [00:48:56] right when there are so stem forces go on to a separate job I saw many of them [00:48:58] on to a separate job I saw many of them have amazing careers I saw you know if [00:49:01] have amazing careers I saw you know if you have like like okay careers but I [00:49:04] you have like like okay careers but I think over time I've learned to patent [00:49:06] think over time I've learned to patent match what is predictive of your future [00:49:09] match what is predictive of your future success after you leave Stanford only [00:49:11] success after you leave Stanford only share view some of those paths share [00:49:12] share view some of those paths share view some of those patterns as you [00:49:13] view some of those patterns as you navigate your career and and and it's [00:49:16] navigate your career and and and it's just a so many options in machine [00:49:17] just a so many options in machine learning today it's kind of tragic if [00:49:19] learning today it's kind of tragic if you don't you know navigate to hopefully [00:49:21] you don't you know navigate to hopefully maximized [00:49:22] maximized one of the people that gets to do fun [00:49:25] one of the people that gets to do fun and important work that helps helps [00:49:26] and important work that helps helps others so when selecting a position I [00:49:32] others so when selecting a position I would advise you to focus on the team [00:49:43] you interact with and by team I mean you [00:49:46] you interact with and by team I mean you know somewhere between ten to thirty [00:49:49] know somewhere between ten to thirty persons right maybe up to fifty because [00:49:53] persons right maybe up to fifty because it turns out that you if you there will [00:49:57] it turns out that you if you there will be some group of people maybe ten to [00:49:59] be some group of people maybe ten to thirty people maybe fifty people that [00:50:01] thirty people maybe fifty people that you interact with quite closely and [00:50:02] you interact with quite closely and these will be appears in the people that [00:50:05] these will be appears in the people that that will influence you the most right [00:50:07] that will influence you the most right because if you join a company with [00:50:10] because if you join a company with 10,000 people you will not interact with [00:50:12] 10,000 people you will not interact with all 10,000 people there will be a corps [00:50:14] all 10,000 people there will be a corps of 10 or 30 or 50 people that you [00:50:16] of 10 or 30 or 50 people that you interact with the most and is those [00:50:19] interact with the most and is those people how much they know how much in [00:50:20] people how much they know how much in teach you how hard-working they are [00:50:22] teach you how hard-working they are whether they are learning themselves [00:50:23] whether they are learning themselves that were influenced you the most rather [00:50:25] that were influenced you the most rather than all these other hypothetical 10,000 [00:50:28] than all these other hypothetical 10,000 people in a giant company and of these [00:50:31] people in a giant company and of these people one of the ones that will [00:50:33] people one of the ones that will influence you the most is your manager [00:50:35] influence you the most is your manager alright so make sure you meet your [00:50:37] alright so make sure you meet your manager and get to know them and make [00:50:38] manager and get to know them and make sure there's someone you want to work [00:50:40] sure there's someone you want to work with and in particular I wouldn't [00:50:43] with and in particular I wouldn't recommend focusing on these things and [00:50:45] recommend focusing on these things and not on the brand of the company because [00:50:54] not on the brand of the company because it turns out that the brand of the [00:50:56] it turns out that the brand of the company you work with is actually not [00:50:58] company you work with is actually not that correlated you know maybe there's a [00:51:00] that correlated you know maybe there's a very recall relation but it's actually [00:51:02] very recall relation but it's actually not that correlated with what your [00:51:03] not that correlated with what your personal experience will be like [00:51:05] personal experience will be like right and so [00:51:14] and and by the way and getting just new [00:51:16] and and by the way and getting just new full disclosure I'm one that you know I [00:51:19] full disclosure I'm one that you know I have a research group here at Stanford [00:51:20] have a research group here at Stanford right my research career Stanford is one [00:51:22] right my research career Stanford is one of the better-known researchers in the [00:51:24] of the better-known researchers in the world but just don't join us because you [00:51:26] world but just don't join us because you think we're well-known right is this [00:51:28] think we're well-known right is this just not a good reason to join us for [00:51:29] just not a good reason to join us for the brand instead you only work with [00:51:31] the brand instead you only work with someone meet the people and evaluate the [00:51:33] someone meet the people and evaluate the individuals or look at the people and [00:51:35] individuals or look at the people and see if you you think these are people [00:51:37] see if you you think these are people you can learn from a worker better good [00:51:39] you can learn from a worker better good people so um so in today's world there [00:51:59] people so um so in today's world there are a lot of companies recruiting [00:52:02] are a lot of companies recruiting Stanford students so let me give you [00:52:04] Stanford students so let me give you some advice and did this piece of my [00:52:07] some advice and did this piece of my good because many is well I'll just give [00:52:09] good because many is well I'll just give the advice so sometimes there are giant [00:52:13] the advice so sometimes there are giant companies with let's say um you know [00:52:16] companies with let's say um you know fifty thousand people right and I'm not [00:52:18] fifty thousand people right and I'm not thinking of any one specific company if [00:52:20] thinking of any one specific company if you're trying to guess what content [00:52:21] you're trying to guess what content think of there's no one special company [00:52:22] think of there's no one special company I'm thinking of but this pattern matches [00:52:24] I'm thinking of but this pattern matches to many large companies but maybe [00:52:27] to many large companies but maybe there's a giant company with you know [00:52:28] there's a giant company with you know fifty thousand people right and let's [00:52:34] fifty thousand people right and let's say that they have a 300 person right I [00:52:41] say that they have a 300 person right I team it turns out that if you look at [00:52:46] team it turns out that if you look at the work of a few hundred presently I [00:52:47] the work of a few hundred presently I team and if they send you a job offer to [00:52:50] team and if they send you a job offer to join the 300 person the I team that [00:52:52] join the 300 person the I team that might be pretty good right this may be [00:52:54] might be pretty good right this may be the group you know who's working here [00:52:56] the group you know who's working here about the potion's papers you read on [00:52:58] about the potion's papers you read on news and so if you get a job offer to [00:53:00] news and so if you get a job offer to work with this group that might be [00:53:02] work with this group that might be pretty good or even better would be [00:53:04] pretty good or even better would be sometimes even within the thirty person [00:53:06] sometimes even within the thirty person the i-team is actually difficult to tell [00:53:08] the i-team is actually difficult to tell what's good and was not there's often a [00:53:10] what's good and was not there's often a lot of areas even with this one's even [00:53:12] lot of areas even with this one's even better would be if you get a job offer [00:53:15] better would be if you get a job offer to join the 30 person team so you [00:53:18] to join the 30 person team so you actually know who's your manager who [00:53:20] actually know who's your manager who your peers who you're working with and [00:53:21] your peers who you're working with and if you think these are thirty great [00:53:22] if you think these are thirty great people learn for [00:53:23] people learn for that could be a great job offer the [00:53:27] that could be a great job offer the failure mode that unfortunately I've [00:53:29] failure mode that unfortunately I've seen several Stanford students go down [00:53:32] seen several Stanford students go down or it's actually this is true sorry [00:53:34] or it's actually this is true sorry there was one several years ago as a [00:53:35] there was one several years ago as a Stanford student on you that I thought [00:53:37] Stanford student on you that I thought was a great guy right you know I I knew [00:53:39] was a great guy right you know I I knew his work he was coding machine learning [00:53:40] his work he was coding machine learning algorithms I thought he was very sharp [00:53:42] algorithms I thought he was very sharp and did very good work working with some [00:53:44] and did very good work working with some of my PhD students he got a job offer [00:53:47] of my PhD students he got a job offer from one of these giant companies with [00:53:50] from one of these giant companies with that has a great AI group and his alpha [00:53:53] that has a great AI group and his alpha wasn't to go to the AI group his offer [00:53:55] wasn't to go to the AI group his offer was to join us and they will assign you [00:53:58] was to join us and they will assign you to a team so this book was student that [00:54:01] to a team so this book was student that was the sample student I know about they [00:54:03] was the sample student I know about they care about he wound up being assigned to [00:54:06] care about he wound up being assigned to really um boring Java back-end payments [00:54:09] really um boring Java back-end payments team and so often you accept a job offer [00:54:13] team and so often you accept a job offer he wound up being assigned to a you know [00:54:15] he wound up being assigned to a you know back in and apologizing you work on Java [00:54:17] back in and apologizing you work on Java back in payment processing systems I [00:54:19] back in payment processing systems I think that great but the student was [00:54:21] think that great but the student was assigned to that team and he was really [00:54:23] assigned to that team and he was really bored and so um I think that this was a [00:54:26] bored and so um I think that this was a student whose career I Percy saw his [00:54:29] student whose career I Percy saw his career rising while he was at Stanford [00:54:31] career rising while he was at Stanford and after he went to this you know [00:54:34] and after he went to this you know frankly not very interesting team I saw [00:54:36] frankly not very interesting team I saw his career plateau and after about a [00:54:38] his career plateau and after about a year and a half he resigned from this [00:54:40] year and a half he resigned from this company after wasting a year and a half [00:54:42] company after wasting a year and a half of his life and missing no really on a [00:54:44] of his life and missing no really on a year and a half of this very exciting [00:54:45] year and a half of this very exciting growth of AI machine learning right so [00:54:48] growth of AI machine learning right so it was very unfortunate and and it was [00:54:51] it was very unfortunate and and it was actually after I told this story um lost [00:54:54] actually after I told this story um lost time I told this class earlier this year [00:54:55] time I told this class earlier this year that actually someone from acquisition [00:54:59] that actually someone from acquisition from the same big company he found me [00:55:02] from the same big company he found me and said boy anchor I wish she told me [00:55:04] and said boy anchor I wish she told me the story earlier because exactly what [00:55:06] the story earlier because exactly what happened to me at the same big company [00:55:08] happened to me at the same big company oh no I want to share view a different [00:55:17] so so I would just be careful about [00:55:20] so so I would just be careful about rotation programs as well you know when [00:55:23] rotation programs as well you know when the company is trying to recruit you [00:55:24] the company is trying to recruit you if a company refuses to tell you what [00:55:26] if a company refuses to tell you what project you work on who is your manager [00:55:28] project you work on who is your manager exactly what's your joining i / c do not [00:55:30] exactly what's your joining i / c do not find those job offers data track [00:55:33] find those job offers data track because if they can't you know if it [00:55:37] because if they can't you know if it refused to tell you what team you're [00:55:38] refused to tell you what team you're gonna work with well chances are right [00:55:40] gonna work with well chances are right telling you the answer will not make the [00:55:42] telling you the answer will not make the job attractive to you that's why they're [00:55:44] job attractive to you that's why they're not telling you so I just be very [00:55:45] not telling you so I just be very careful and sometimes rotation programs [00:55:48] careful and sometimes rotation programs sound good on paper but it's really you [00:55:50] sound good on paper but it's really you know well we'll figure out where to send [00:55:52] know well we'll figure out where to send you later so I feel like I've seen some [00:55:54] you later so I feel like I've seen some students go into rotation programs that [00:55:57] students go into rotation programs that sound good on paper that sound like a [00:55:59] sound good on paper that sound like a good idea but just as you wouldn't after [00:56:01] good idea but just as you wouldn't after you graduate Stanford what you want to [00:56:02] you graduate Stanford what you want to do for internships and then apply for a [00:56:04] do for internships and then apply for a job that would be a weird thing to do [00:56:06] job that would be a weird thing to do so sometimes rotation firms out yeah [00:56:08] so sometimes rotation firms out yeah come and do for internships and then [00:56:09] come and do for internships and then we'll let you apply for a job and see [00:56:10] we'll let you apply for a job and see where when I send you and they cook your [00:56:12] where when I send you and they cook your java back and payment processing system [00:56:13] java back and payment processing system right so so so just just be cautious [00:56:17] right so so so just just be cautious about the marketing of rotation programs [00:56:19] about the marketing of rotation programs oh and again if you do if but if there's [00:56:23] oh and again if you do if but if there's but but if they what they say is do [00:56:24] but but if they what they say is do rotation and then you join this team [00:56:26] rotation and then you join this team then you can look at this student and [00:56:28] then you can look at this student and say yep that's a great team I want to do [00:56:30] say yep that's a great team I want to do a rotation but then I would go and work [00:56:32] a rotation but then I would go and work with this human and these are the 30 [00:56:34] with this human and these are the 30 people I work with so that could be [00:56:35] people I work with so that could be great but through a rotation then it [00:56:37] great but through a rotation then it could send you anywhere in this giant [00:56:38] could send you anywhere in this giant company that I would just be very [00:56:39] company that I would just be very careful at all um now on the flip side [00:56:43] careful at all um now on the flip side there are some companies I'm not gonna [00:56:46] there are some companies I'm not gonna mention any companies but there are some [00:56:48] mention any companies but there are some companies with you know are not as [00:56:49] companies with you know are not as glamorous not as not as right cool [00:56:51] glamorous not as not as right cool brands um and maybe this is a only [00:56:55] brands um and maybe this is a only 10,000 person company or a 1,000 or [00:56:58] 10,000 person company or a 1,000 or 50,000 per second or whatever this is [00:56:59] 50,000 per second or whatever this is 10,000 per copy [00:57:00] 10,000 per copy I have seen many companies that are not [00:57:03] I have seen many companies that are not super well-known in the AI world and not [00:57:06] super well-known in the AI world and not in the news all the time but they may [00:57:08] in the news all the time but they may have a very elite team of a hundred [00:57:11] have a very elite team of a hundred people doing great work on machine [00:57:14] people doing great work on machine learning right and there are definitely [00:57:15] learning right and there are definitely companies whose brands are not you know [00:57:18] companies whose brands are not you know the first companies you think of when [00:57:20] the first companies you think of when you think of great companies that [00:57:22] you think of great companies that sometimes have a really really great ten [00:57:25] sometimes have a really really great ten person or fifty percent of 100 person [00:57:27] person or fifty percent of 100 person team that were all Sun during the [00:57:29] team that were all Sun during the algorithms and even if the overall brand [00:57:32] algorithms and even if the overall brand of the overall company you know isn't as [00:57:34] of the overall company you know isn't as like there's a little bit sucky if you [00:57:37] like there's a little bit sucky if you manage to track down this team and if [00:57:39] manage to track down this team and if you have a job offer to join this elite [00:57:42] you have a job offer to join this elite team in a much bigger company you could [00:57:44] team in a much bigger company you could actually learn a lot from these people [00:57:45] actually learn a lot from these people and do important work you know one of [00:57:48] and do important work you know one of the things about Silicon Valley is that [00:57:49] the things about Silicon Valley is that um the brand on your resume matters less [00:57:53] um the brand on your resume matters less and less and less than ever before I [00:57:55] and less and less than ever before I mean I guess I think the exceptions of [00:57:58] mean I guess I think the exceptions of Stanford Brown you totally won the [00:57:59] Stanford Brown you totally won the Stanford Brown the resume but with that [00:58:01] Stanford Brown the resume but with that exception by really your silk'n values [00:58:03] exception by really your silk'n values become really good sorry the world right [00:58:05] become really good sorry the world right has become really good at evaluating [00:58:06] has become really good at evaluating people for your genuine technical [00:58:08] people for your genuine technical schools in your genuine capability and [00:58:10] schools in your genuine capability and less for your brand and so I would [00:58:13] less for your brand and so I would recommend that instead of trying to get [00:58:15] recommend that instead of trying to get like the best stamps of approval on your [00:58:17] like the best stamps of approval on your resume to go ahead and take the [00:58:19] resume to go ahead and take the positions that let you have the best [00:58:21] positions that let you have the best learning experiences and also allows you [00:58:23] learning experiences and also allows you to do the most important work and that [00:58:24] to do the most important work and that is really shaped by the you know 30 or [00:58:27] is really shaped by the you know 30 or 50 people you work with and not by the [00:58:30] 50 people you work with and not by the overall brand of the company you work [00:58:32] overall brand of the company you work with right so the variance I cross so [00:58:36] with right so the variance I cross so there's a huge variance across teams [00:58:38] there's a huge variance across teams within one company and that variance is [00:58:41] within one company and that variance is actually pretty bigger or might be [00:58:43] actually pretty bigger or might be bigger than the variance across [00:58:44] bigger than the variance across different companies sense so solid and [00:58:46] different companies sense so solid and and if a company refuses to tell you [00:58:49] and if a company refuses to tell you what team you were joined I would [00:58:50] what team you were joined I would seriously consider just you know doing [00:58:52] seriously consider just you know doing something well if you have a better [00:58:53] something well if you have a better option I would I would do something else [00:58:56] option I would I would do something else um and then finally yeah and and so [00:59:01] um and then finally yeah and and so really again I guess I don't want to [00:59:03] really again I guess I don't want to name these companies but you know think [00:59:04] name these companies but you know think of some of the large retailers or [00:59:06] of some of the large retailers or something large healthcare systems or [00:59:08] something large healthcare systems or the lot of companies that are not [00:59:11] the lot of companies that are not well-known in the AI world but that I've [00:59:13] well-known in the AI world but that I've met there a I teams I think they're [00:59:15] met there a I teams I think they're great [00:59:15] great and so if you're able to find those jobs [00:59:16] and so if you're able to find those jobs and meet their people you can actually [00:59:18] and meet their people you can actually get very exciting jobs in there right [00:59:20] get very exciting jobs in there right but of course for the giant companies [00:59:22] but of course for the giant companies that the avai teams you can join that [00:59:24] that the avai teams you can join that yielding AIT money that's also that's [00:59:26] yielding AIT money that's also that's also great I'm a bit biased since I used [00:59:28] also great I'm a bit biased since I used to leave somebody's eating your team so [00:59:30] to leave somebody's eating your team so so I think those things are great but [00:59:31] so I think those things are great but but but also some teams in not all right [00:59:39] but but also some teams in not all right um lastly you know just general advice [00:59:42] um lastly you know just general advice this is how I really live my life I tend [00:59:46] this is how I really live my life I tend to choose two things to work on they'll [00:59:49] to choose two things to work on they'll allow you to [00:59:53] learn the most you know we try to do [00:59:57] learn the most you know we try to do important work so you know especially if [01:00:08] important work so you know especially if you're relatively early in career what [01:00:10] you're relatively early in career what have you learned in your career will pay [01:00:12] have you learned in your career will pay off for a long time and so and so [01:00:17] off for a long time and so and so joining the teams and working with a [01:00:19] joining the teams and working with a great set of 10 or 30 or 50 teammates [01:00:22] great set of 10 or 30 or 50 teammates will let you learn a lot and then also [01:00:25] will let you learn a lot and then also you know hopefully I mean yeah and and [01:00:27] you know hopefully I mean yeah and and and just don't don't don't join a like a [01:00:30] and just don't don't don't join a like a cigarette company and help you know give [01:00:33] cigarette company and help you know give more people cancer or stuff like there [01:00:35] more people cancer or stuff like there is this don't don't do this don't don't [01:00:37] is this don't don't do this don't don't do stuff like that but if you can do [01:00:39] do stuff like that but if you can do meaningful work that helps other people [01:00:40] meaningful work that helps other people and do important work and also learn a [01:00:43] and do important work and also learn a lot on the way hopefully you can find [01:00:45] lot on the way hopefully you can find positions like that that lets you set [01:00:49] positions like that that lets you set yourself up for long-term success but [01:00:50] yourself up for long-term success but also do work that you think matters in [01:00:52] also do work that you think matters in that and then helps other people [01:00:54] that and then helps other people alright um any questions it was [01:01:11] alright um any questions it was important you know yeah um I think one [01:01:15] important you know yeah um I think one of the most meaningful things you do in [01:01:16] of the most meaningful things you do in life is how people either advance the [01:01:19] life is how people either advance the human condition or help other people but [01:01:21] human condition or help other people but the thing is I'm nervous I don't a name [01:01:23] the thing is I'm nervous I don't a name one or two things because the world [01:01:25] one or two things because the world needs a lot of people who work on a lot [01:01:26] needs a lot of people who work on a lot of different things so the world's not [01:01:29] of different things so the world's not gonna function if everyone works on [01:01:30] gonna function if everyone works on computational biology I think umpire is [01:01:32] computational biology I think umpire is great but it's actually good that what [01:01:35] great but it's actually good that what people work on compile my PhD students [01:01:37] people work on compile my PhD students like you know mainly work on the outside [01:01:40] like you know mainly work on the outside to healthcare my team at landing er does [01:01:43] to healthcare my team at landing er does a lot of work on the outside [01:01:43] a lot of work on the outside manufacturing agriculture to some [01:01:46] manufacturing agriculture to some healthcare and some other industries I [01:01:49] healthcare and some other industries I actually especially California fires [01:01:51] actually especially California fires burning you know I actually think that [01:01:54] burning you know I actually think that there's important work to be done in AI [01:01:55] there's important work to be done in AI climate change but I think that there's [01:01:59] climate change but I think that there's a lot of them important work a lot of [01:02:01] a lot of them important work a lot of industries right so I actually think [01:02:04] industries right so I actually think that you know I should think that the [01:02:05] that you know I should think that the next wave of AI sees me I should say [01:02:08] next wave of AI sees me I should say machine learning is we've we've already [01:02:10] machine learning is we've we've already young transform a lot of the tech well [01:02:13] young transform a lot of the tech well right so you know yeah I mean we've [01:02:18] right so you know yeah I mean we've already helped a lot of the [01:02:19] already helped a lot of the circumvallate tech world become good at [01:02:22] circumvallate tech world become good at AI and that's big right how build a [01:02:23] AI and that's big right how build a couple of the teams that wound up doing [01:02:25] couple of the teams that wound up doing this right Google brain how Google [01:02:27] this right Google brain how Google become cognitive learning the battery I [01:02:29] become cognitive learning the battery I hope I do become you know couldn't one [01:02:32] hope I do become you know couldn't one of the greatest companies in the world [01:02:33] of the greatest companies in the world set in China and I'm very happy that [01:02:37] set in China and I'm very happy that between me and so my friends in the [01:02:39] between me and so my friends in the industry we've made a lot of good AI [01:02:41] industry we've made a lot of good AI companies I think part of the next phase [01:02:43] companies I think part of the next phase for the evolution of machine learning is [01:02:46] for the evolution of machine learning is fair to go into not just to check [01:02:48] fair to go into not just to check companies like you know like the Google [01:02:51] companies like you know like the Google and Baidu which I hope this was you know [01:02:52] and Baidu which I hope this was you know Facebook Microsoft which had nothing to [01:02:54] Facebook Microsoft which had nothing to do as well as well that was a BMP [01:02:57] do as well as well that was a BMP Pinterest ruber right all these like [01:02:59] Pinterest ruber right all these like great companies that hope they'll all [01:03:00] great companies that hope they'll all embrace a yard but I think some of the [01:03:02] embrace a yard but I think some of the most exciting work to be done stores [01:03:03] most exciting work to be done stores also look outside to check industry and [01:03:06] also look outside to check industry and to look at all the sometimes calling [01:03:08] to look at all the sometimes calling traditional industries that do not have [01:03:10] traditional industries that do not have shiny tech things because I think the [01:03:13] shiny tech things because I think the value creation there as surprised you [01:03:15] value creation there as surprised you could implement there maybe even bigger [01:03:18] could implement there maybe even bigger than if you you know yeah I mention one [01:03:23] than if you you know yeah I mention one interesting thing one thing I notice is [01:03:25] interesting thing one thing I notice is more than large tech companies all work [01:03:26] more than large tech companies all work on the same problems right so everyone [01:03:28] on the same problems right so everyone works a machine translation everyone [01:03:30] works a machine translation everyone Russian speech recognition face [01:03:32] Russian speech recognition face detection at quick through rate and [01:03:33] detection at quick through rate and probably feels like this is great [01:03:35] probably feels like this is great because it means there's a lot of [01:03:36] because it means there's a lot of progress in machine translation and [01:03:38] progress in machine translation and that's great we do want progress in [01:03:40] that's great we do want progress in machine translation but sometimes we [01:03:42] machine translation but sometimes we look at other industries so you know [01:03:45] look at other industries so you know when you look at manufacturing or how [01:03:48] when you look at manufacturing or how some of the medical devices things [01:03:50] some of the medical devices things working ads or sometimes on these phones [01:03:53] working ads or sometimes on these phones hang out with farmers on [01:03:55] hang out with farmers on feel like in my own work my team's work [01:03:58] feel like in my own work my team's work sometimes we're stumbling across [01:04:00] sometimes we're stumbling across brand-new research problems that big [01:04:03] brand-new research problems that big tech companies do not see and have not [01:04:04] tech companies do not see and have not yet done to frame so I find one most in [01:04:07] yet done to frame so I find one most in search of exciting challenges is [01:04:08] search of exciting challenges is actually to be constantly on the cutting [01:04:10] actually to be constantly on the cutting edge looking at these types of problems [01:04:12] edge looking at these types of problems there's a different cutting edge than [01:04:13] there's a different cutting edge than the cutting edge a big tech companies so [01:04:15] the cutting edge a big tech companies so I think some of you are joining a big [01:04:17] I think some of you are joining a big tech companies and that's great we need [01:04:18] tech companies and that's great we need more AI the big companies and the tech [01:04:20] more AI the big companies and the tech companies but I think a lot of the [01:04:22] companies but I think a lot of the exciting work to do in AI is also [01:04:24] exciting work to do in AI is also outside while we traditionally [01:04:26] outside while we traditionally considered tech all right us 10 this [01:04:31] considered tech all right us 10 this whole 50s so hope I hope this was [01:04:34] whole 50s so hope I hope this was helpful and let's let's break for today [01:04:37] helpful and let's let's break for today or have a have a great Thanksgiving [01:04:39] or have a have a great Thanksgiving everyone and we'll see in a couple weeks ================================================================================ LECTURE 009 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 9 - Deep Reinforcement Learning Source: https://www.youtube.com/watch?v=NP2XqpgTJyo --- Transcript [00:00:05] hi everyone and welcome to lecture nine [00:00:08] hi everyone and welcome to lecture nine for CS 2:30 today we're going to discuss [00:00:13] for CS 2:30 today we're going to discuss an advanced topic that will be kind of [00:00:16] an advanced topic that will be kind of the marriage between deep learning and [00:00:20] the marriage between deep learning and another field of AI which is [00:00:23] another field of AI which is reinforcement learning and we will see a [00:00:25] reinforcement learning and we will see a practical application and how deep [00:00:29] practical application and how deep learning methods can be plugged in [00:00:31] learning methods can be plugged in another family of algorithm so it's [00:00:34] another family of algorithm so it's interesting because deep learning [00:00:35] interesting because deep learning methods and deep inner networks have [00:00:37] methods and deep inner networks have been shown to be very good function [00:00:40] been shown to be very good function approximate errs essentially that's what [00:00:42] approximate errs essentially that's what they are we're giving them data so that [00:00:44] they are we're giving them data so that they can approximate a function there [00:00:45] they can approximate a function there are a lot of different fields which [00:00:48] are a lot of different fields which require these function approximate errs [00:00:50] require these function approximate errs and deep learning methods can be plugged [00:00:52] and deep learning methods can be plugged in all these methods this is one of [00:00:55] in all these methods this is one of these examples so we'll first motivate [00:00:58] these examples so we'll first motivate the setting of reinforcement learning [00:01:01] the setting of reinforcement learning why do we need your enforcement learning [00:01:04] why do we need your enforcement learning why cannot what why can't we use deep [00:01:07] why cannot what why can't we use deep learning methods to solve everything [00:01:09] learning methods to solve everything there is some set of methods that we [00:01:12] there is some set of methods that we cannot solve with deep learning and [00:01:14] cannot solve with deep learning and reinforcement learning that [00:01:15] reinforcement learning that reinforcement learning applications are [00:01:17] reinforcement learning applications are examples of that we will see an example [00:01:21] examples of that we will see an example to introduce an algorithm of real for [00:01:25] to introduce an algorithm of real for smuggling algorithm called cue learning [00:01:27] smuggling algorithm called cue learning and we will add deep learning to this [00:01:30] and we will add deep learning to this algorithm and make it deep cue learning [00:01:33] algorithm and make it deep cue learning as we've seen with generative add [00:01:37] as we've seen with generative add virtual networks and also deep neural [00:01:38] virtual networks and also deep neural networks most models are hard to train [00:01:42] networks most models are hard to train with how we had to come up with the [00:01:44] with how we had to come up with the initialization we drop out with batch [00:01:47] initialization we drop out with batch norm and mirrors mirrors of methods to [00:01:51] norm and mirrors mirrors of methods to make this deep neural networks trained [00:01:53] make this deep neural networks trained in gans we had to use methods as well in [00:01:57] in gans we had to use methods as well in order to train guns and tricks and hacks [00:01:59] order to train guns and tricks and hacks so here we will see some of the tips and [00:02:02] so here we will see some of the tips and tricks to train deep cue learning which [00:02:07] tricks to train deep cue learning which is a reinforcement learning algorithm [00:02:09] is a reinforcement learning algorithm and at the end we will have a guest [00:02:12] and at the end we will have a guest speaker coming to talk about advanced [00:02:15] speaker coming to talk about advanced topics which are mostly research which [00:02:18] topics which are mostly research which buying deep learning and reinforcement [00:02:19] buying deep learning and reinforcement learning sounds good okay let's go so [00:02:24] learning sounds good okay let's go so different force malaria is a very recent [00:02:27] different force malaria is a very recent field I would say although both things [00:02:29] field I would say although both things are our enforcement ring has existed for [00:02:32] are our enforcement ring has existed for a long time only recently it's been [00:02:35] a long time only recently it's been shown that using deep learning as a way [00:02:38] shown that using deep learning as a way to approximate the functions that play a [00:02:41] to approximate the functions that play a big role in reinforcement learning [00:02:42] big role in reinforcement learning algorithms has worked a lot so one [00:02:45] algorithms has worked a lot so one example is alphago and you probably all [00:02:48] example is alphago and you probably all have heard of it it's google deepmind's [00:02:50] have heard of it it's google deepmind's alphago has beaten world champions in a [00:02:55] alphago has beaten world champions in a game called the game of Go which is a [00:02:57] game called the game of Go which is a very very old strategy old game and the [00:03:01] very very old strategy old game and the one on the right here or on your right [00:03:05] one on the right here or on your right human level controlled through deep [00:03:07] human level controlled through deep reinforcement learning is also a deep [00:03:10] reinforcement learning is also a deep mind a google deepmind paper that came [00:03:13] mind a google deepmind paper that came out and hit the headlines on the front [00:03:14] out and hit the headlines on the front page of nature which is a one of the [00:03:18] page of nature which is a one of the leading multi disciplinary peer-reviewed [00:03:20] leading multi disciplinary peer-reviewed journals in the world and they've shown [00:03:23] journals in the world and they've shown that with deep learning plugged in a [00:03:27] that with deep learning plugged in a reinforcement learning setting they can [00:03:29] reinforcement learning setting they can train an agent that beats human level in [00:03:32] train an agent that beats human level in a variety of games and in fact these are [00:03:36] a variety of games and in fact these are Atari games so they've shown actually [00:03:39] Atari games so they've shown actually that their algorithm the same algorithm [00:03:41] that their algorithm the same algorithm reproduced for a large number of games [00:03:44] reproduced for a large number of games can beat humans on all of these games [00:03:46] can beat humans on all of these games most of these games not all of these so [00:03:50] most of these games not all of these so these are two examples although they use [00:03:52] these are two examples although they use different sub techniques of [00:03:54] different sub techniques of reinforcement learning they both include [00:03:57] reinforcement learning they both include some deep learning aspect in it and [00:03:58] some deep learning aspect in it and today we will mostly talk about the [00:04:00] today we will mostly talk about the human level control through deep [00:04:02] human level control through deep reinforcement learning also called deep [00:04:04] reinforcement learning also called deep cue Network presented in this paper so [00:04:08] cue Network presented in this paper so let's start with with motivating [00:04:10] let's start with with motivating reinforcement learning using the the [00:04:12] reinforcement learning using the the alphago setting this is a board of goal [00:04:17] alphago setting this is a board of goal and the picture comes from deep mind [00:04:19] and the picture comes from deep mind block so go you can think of it as a [00:04:21] block so go you can think of it as a strategy game where you have a grid that [00:04:24] strategy game where you have a grid that is up to 19 by 19 and you have two [00:04:27] is up to 19 by 19 and you have two players one player has white stones and [00:04:30] players one player has white stones and one player has black stones [00:04:31] one player has black stones and at every step in the game you can [00:04:33] and at every step in the game you can position a stone on the on the board on [00:04:35] position a stone on the on the board on one of the grid cross the goal is to [00:04:38] one of the grid cross the goal is to surround your opponent so to maximize [00:04:41] surround your opponent so to maximize your territory by surrounding your [00:04:44] your territory by surrounding your opponent and it's a very complex game [00:04:46] opponent and it's a very complex game for different reasons [00:04:47] for different reasons one reason is that you have to be you [00:04:50] one reason is that you have to be you cannot be short-sighted in this game you [00:04:52] cannot be short-sighted in this game you have to have a long term strategy and [00:04:54] have to have a long term strategy and other reason is that the board is so big [00:04:56] other reason is that the board is so big it's much bigger than a chess board [00:04:58] it's much bigger than a chess board right chess board is 8 by 8 so let me [00:05:03] right chess board is 8 by 8 so let me ask you a question if you had to solve [00:05:05] ask you a question if you had to solve or build an agent that solves this game [00:05:09] or build an agent that solves this game and beats humans or plays very well at [00:05:11] and beats humans or plays very well at least with deep learning methods that [00:05:14] least with deep learning methods that you've seen so far how would you do that [00:05:33] someone wants to try so let's say you [00:05:40] someone wants to try so let's say you have a you have to collect the data set [00:05:42] have a you have to collect the data set because in classic supervised learning [00:05:43] because in classic supervised learning we need a data set with X&Y [00:05:46] we need a data set with X&Y what do you think would be your x and y [00:05:51] yeah okay [00:05:58] yeah okay input is game board and output is [00:06:01] input is game board and output is probability of victory in that position [00:06:03] probability of victory in that position so that's that's a good one I think [00:06:05] so that's that's a good one I think input output what's the issue with that [00:06:07] input output what's the issue with that one so yeah it's super hard to represent [00:06:20] one so yeah it's super hard to represent what the probability of winning is from [00:06:22] what the probability of winning is from this boy even like nobody can tell even [00:06:25] this boy even like nobody can tell even if I ask an expert human to come and [00:06:27] if I ask an expert human to come and tell us what's the probability of black [00:06:29] tell us what's the probability of black winning in this or white winning in this [00:06:31] winning in this or white winning in this setting they wouldn't be able to tell so [00:06:35] setting they wouldn't be able to tell so this is a little more complicated any [00:06:36] this is a little more complicated any other ideas of data sets yep [00:06:42] other ideas of data sets yep okay good point so we could have the [00:06:45] okay good point so we could have the grid like this one and then this is the [00:06:48] grid like this one and then this is the input and the output would be the move [00:06:50] input and the output would be the move the next action taken by probably a [00:06:53] the next action taken by probably a professional player so we would just [00:06:55] professional player so we would just watch professional players playing and [00:06:57] watch professional players playing and we would record their moves and we would [00:06:59] we would record their moves and we would build a data set of what is a [00:07:01] build a data set of what is a professional move and we hope that our [00:07:04] professional move and we hope that our network using this input-output will at [00:07:07] network using this input-output will at some point learn how the professional [00:07:09] some point learn how the professional players play and given an input state of [00:07:11] players play and given an input state of the board we'll be able to decide of the [00:07:14] the board we'll be able to decide of the next move what's the issue with that [00:07:26] [Music] [00:07:30] yes you need a whole lot of data why and [00:07:34] yes you need a whole lot of data why and you said it you said because we need [00:07:38] you said it you said because we need basically to represent all types of [00:07:40] basically to represent all types of positions of the board all states so if [00:07:43] positions of the board all states so if you were actually let's let's do that if [00:07:45] you were actually let's let's do that if we were to compute the number of [00:07:47] we were to compute the number of possible states of this board what would [00:07:50] possible states of this board what would it be so 19 by 19 word [00:08:06] remember what we did with adversarial [00:08:08] remember what we did with adversarial examples we did it for pixel's right now [00:08:11] examples we did it for pixel's right now we're doing it for a board so what's the [00:08:15] we're doing it for a board so what's the question first is yes you want to try [00:08:20] yeah 3 to the power 9 10 times 90 or 93 [00:08:26] yeah 3 to the power 9 10 times 90 or 93 yeah so why is it that is it yeah each [00:08:36] yeah so why is it that is it yeah each spot and there are 19 times 19 spots can [00:08:40] spot and there are 19 times 19 spots can have 3 state basically no stone white [00:08:43] have 3 state basically no stone white stone or black stone so this is the all [00:08:45] stone or black stone so this is the all possible state this is about 10 to the [00:08:51] possible state this is about 10 to the 117 so it's super super big so we can [00:08:56] 117 so it's super super big so we can probably not get even close to that by [00:09:00] probably not get even close to that by observing professional players first [00:09:01] observing professional players first because we don't have enough Pro [00:09:03] because we don't have enough Pro shelters and because we're humans and we [00:09:05] shelters and because we're humans and we don't have infinite life so the [00:09:07] don't have infinite life so the professional players cannot play forever [00:09:09] professional players cannot play forever they might get tired this way but so one [00:09:12] they might get tired this way but so one issue is the state space is too big [00:09:14] issue is the state space is too big another one is that the ground truth [00:09:16] another one is that the ground truth probably would be wrong it's not because [00:09:19] probably would be wrong it's not because you're a professional player that you [00:09:20] you're a professional player that you will play the best move every time right [00:09:22] will play the best move every time right every player has their own strategy so [00:09:25] every player has their own strategy so the ground truth where we're having here [00:09:27] the ground truth where we're having here is not necessarily true and our network [00:09:30] is not necessarily true and our network might might not be able to beat these [00:09:33] might might not be able to beat these human players what we're looking into [00:09:35] human players what we're looking into here is an algorithm that beats humans [00:09:37] here is an algorithm that beats humans okay second one to many states in the [00:09:41] okay second one to many states in the game as you mentioned and third one we [00:09:43] game as you mentioned and third one we will likely not generalize the reason we [00:09:46] will likely not generalize the reason we will not generalize is because in [00:09:47] will not generalize is because in classic supervised learning we're [00:09:49] classic supervised learning we're looking for patterns if I ask you to [00:09:50] looking for patterns if I ask you to build an algorithm to detect cats versus [00:09:53] build an algorithm to detect cats versus dogs it will look for what the pattern [00:09:54] dogs it will look for what the pattern of a cat is versus what the pattern of [00:09:56] of a cat is versus what the pattern of the dog is in and the convolutional [00:09:58] the dog is in and the convolutional filters we learn that in this case it's [00:10:00] filters we learn that in this case it's about a strategy it's not about a [00:10:02] about a strategy it's not about a pattern so you have to understand the [00:10:04] pattern so you have to understand the process of winning this game in order to [00:10:07] process of winning this game in order to make the next move you cannot generalize [00:10:09] make the next move you cannot generalize if you don't understand this process of [00:10:11] if you don't understand this process of long term strategy so we have to [00:10:13] long term strategy so we have to incorporate that and that's where RL [00:10:16] incorporate that and that's where RL comes into place [00:10:18] comes into place RL is reinforcement learning a method [00:10:21] RL is reinforcement learning a method that we could be described with one [00:10:23] that we could be described with one sentence as automatically learning to [00:10:26] sentence as automatically learning to make good sequences of decision so it's [00:10:28] make good sequences of decision so it's about the long term it's not about the [00:10:29] about the long term it's not about the shorter and we would use it generally [00:10:32] shorter and we would use it generally when we have delayed labels like in this [00:10:35] when we have delayed labels like in this game the label that you mentioned at the [00:10:38] game the label that you mentioned at the beginning was probability of victory [00:10:39] beginning was probability of victory this is a long term label we cannot get [00:10:41] this is a long term label we cannot get this label now but over time the closer [00:10:44] this label now but over time the closer we get to the end the better we have we [00:10:47] we get to the end the better we have we are at seeing the victory or not and [00:10:49] are at seeing the victory or not and it's for to make sequences of decision [00:10:52] it's for to make sequences of decision so we make a move then the opponent [00:10:53] so we make a move then the opponent makes a move then we make another move [00:10:55] makes a move then we make another move and all the decisions of these move are [00:10:57] and all the decisions of these move are correlated with each other like you have [00:11:00] correlated with each other like you have to plan in advance when you're human you [00:11:02] to plan in advance when you're human you do that when you play chess when you [00:11:03] do that when you play chess when you play go so examples of RL applications [00:11:06] play go so examples of RL applications can be robotics and it's still a [00:11:08] can be robotics and it's still a research topic how deep RL can change [00:11:11] research topic how deep RL can change robotics but thinking about having a [00:11:13] robotics but thinking about having a robot walking from here and you want to [00:11:15] robot walking from here and you want to send it there you want to send the robot [00:11:17] send it there you want to send the robot there what you're teaching the robot is [00:11:20] there what you're teaching the robot is if you get there it's good right it's [00:11:22] if you get there it's good right it's good you achieve the task but I cannot [00:11:25] good you achieve the task but I cannot give you the probability of getting [00:11:26] give you the probability of getting there at every point I can help you out [00:11:29] there at every point I can help you out by giving you a reward when you arrive [00:11:31] by giving you a reward when you arrive there and let you trial and error so the [00:11:33] there and let you trial and error so the robot will try and randomly initialize [00:11:35] robot will try and randomly initialize the robot we just fall down at the first [00:11:37] the robot we just fall down at the first at first gets a negative reward then [00:11:40] at first gets a negative reward then repeats this time the robot knows that [00:11:43] repeats this time the robot knows that it shouldn't fall down it shouldn't go [00:11:44] it shouldn't fall down it shouldn't go down you should probably go this way so [00:11:46] down you should probably go this way so true trial in there and reward on the [00:11:49] true trial in there and reward on the long term the the robot is supposed to [00:11:50] long term the the robot is supposed to learn this pattern [00:11:52] learn this pattern another one is games and that's the one [00:11:55] another one is games and that's the one we will see today games can be [00:11:57] we will see today games can be represented as as a set of reward for [00:12:00] represented as as a set of reward for reinforcement learning algorithm so this [00:12:02] reinforcement learning algorithm so this is where you win this is where you lose [00:12:04] is where you win this is where you lose let the algorithm play and figure out [00:12:06] let the algorithm play and figure out what winning means and what losing means [00:12:09] what winning means and what losing means until it learns okay the problem with [00:12:12] until it learns okay the problem with using deep learning is that the [00:12:13] using deep learning is that the algorithm will not learn because this [00:12:15] algorithm will not learn because this reward is to long term so we're using [00:12:17] reward is to long term so we're using reinforcement learning and finally [00:12:19] reinforcement learning and finally advertisement so a lot of advertisements [00:12:21] advertisement so a lot of advertisements are real time bidding so you want to [00:12:24] are real time bidding so you want to know given a budget when you want to [00:12:26] know given a budget when you want to invest this budget and this is a long [00:12:28] invest this budget and this is a long term strategy planning as well that [00:12:30] term strategy planning as well that reinforcement learning [00:12:31] reinforcement learning can help with okay so this was the [00:12:36] can help with okay so this was the motivation of reinforcement learning [00:12:37] motivation of reinforcement learning we're going to jump to a concrete [00:12:39] we're going to jump to a concrete example that is a super vanilla example [00:12:41] example that is a super vanilla example to understand cue learning so let's [00:12:44] to understand cue learning so let's start with this game or environment so [00:12:47] start with this game or environment so we call that an environment generally [00:12:48] we call that an environment generally and it has several states in this case [00:12:51] and it has several states in this case five states so we have these states and [00:12:54] five states so we have these states and we can define rewards which are the [00:12:56] we can define rewards which are the following so let's see what is our goal [00:12:58] following so let's see what is our goal in this game we define it as maximize [00:13:01] in this game we define it as maximize the return or the reward on the [00:13:03] the return or the reward on the long-term and what is the reward is the [00:13:05] long-term and what is the reward is the numbers that you have here that were [00:13:07] numbers that you have here that were defined by a human so this is where the [00:13:09] defined by a human so this is where the human defines the reward now what's the [00:13:13] human defines the reward now what's the game the game has five states state one [00:13:15] game the game has five states state one is a trash can and has a reward of plus [00:13:19] is a trash can and has a reward of plus two state two is a starting state [00:13:22] two state two is a starting state initial state and we assumed that we [00:13:24] initial state and we assumed that we would start in the initial state with [00:13:26] would start in the initial state with the plastic bottle in our hand the goal [00:13:28] the plastic bottle in our hand the goal will be to throw this plastic bottle in [00:13:30] will be to throw this plastic bottle in a can if it's the trash can we get plus [00:13:33] a can if it's the trash can we get plus two if we get to state five we get to [00:13:36] two if we get to state five we get to the recycle bin and we can get plus ten [00:13:39] the recycle bin and we can get plus ten super important application state four [00:13:43] super important application state four has a chocolate so what happens is if [00:13:47] has a chocolate so what happens is if you go to state four you get a reward of [00:13:49] you go to state four you get a reward of one because you can eat the chocolate [00:13:51] one because you can eat the chocolate and you can also through the the [00:13:53] and you can also through the the chocolate in the in the in the recycle [00:13:56] chocolate in the in the in the recycle bin hopefully that's the setting makes [00:13:58] bin hopefully that's the setting makes sense so these states are of three types [00:14:01] sense so these states are of three types one is the starting state initial which [00:14:04] one is the starting state initial which is brown the normal state which is not [00:14:09] is brown the normal state which is not starting neither neither starting nor an [00:14:12] starting neither neither starting nor an ending state and it's gray and the blue [00:14:16] ending state and it's gray and the blue states are terminal states so if we get [00:14:18] states are terminal states so if we get to the terminal state we end up a game [00:14:21] to the terminal state we end up a game or an episode let's say that's the [00:14:24] or an episode let's say that's the setting make sense okay and here two [00:14:27] setting make sense okay and here two possible actions you have to move either [00:14:29] possible actions you have to move either you go on the left or you go on the [00:14:31] you go on the left or you go on the right an additional rule will we'll add [00:14:35] right an additional rule will we'll add is that the garbage collector will come [00:14:36] is that the garbage collector will come in three minutes [00:14:37] in three minutes and every step takes you one minute so [00:14:40] and every step takes you one minute so you cannot spend more than three minutes [00:14:42] you cannot spend more than three minutes in this game in other words you cannot [00:14:43] in this game in other words you cannot stay at the chocolate and it [00:14:45] stay at the chocolate and it chocolate forever you have to move at [00:14:47] chocolate forever you have to move at some point okay so one question I have [00:14:52] some point okay so one question I have is how do you define the long-term [00:14:54] is how do you define the long-term return because we said we want a [00:14:57] return because we said we want a long-term return we don't want we don't [00:14:58] long-term return we don't want we don't care about short-term returns what do [00:15:05] care about short-term returns what do you think is a good way to define a [00:15:07] you think is a good way to define a long-term return here yeah the sum of [00:15:12] long-term return here yeah the sum of the terminal states the sum of how many [00:15:19] the terminal states the sum of how many points you have when you reach the [00:15:20] points you have when you reach the terminal stage so let's say I'm in state [00:15:23] terminal stage so let's say I'm in state 2 I have 0 reward right now if I reach [00:15:27] 2 I have 0 reward right now if I reach the terminal state on the run on the on [00:15:30] the terminal state on the run on the on your left the plus 2 I get plus 2 reward [00:15:33] your left the plus 2 I get plus 2 reward and I finish the game if I go on the [00:15:36] and I finish the game if I go on the right instead and I reach the plus 10 [00:15:39] right instead and I reach the plus 10 you're saying that the long-term return [00:15:41] you're saying that the long-term return can be all the sum of the rewards I got [00:15:43] can be all the sum of the rewards I got to get there so plus 11 so this is one [00:15:46] to get there so plus 11 so this is one way to define the long term return any [00:15:49] way to define the long term return any other ideas we probably want to [00:16:05] other ideas we probably want to incorporate the time steps and reduce [00:16:07] incorporate the time steps and reduce the reward as as time passes and in fact [00:16:10] the reward as as time passes and in fact this would be called a discounted return [00:16:12] this would be called a discounted return versus what you said would be called a [00:16:14] versus what you said would be called a return here we use a discounted return [00:16:19] return here we use a discounted return in and it has several advantages some [00:16:21] in and it has several advantages some are mathematical because the return you [00:16:24] are mathematical because the return you described which is not discounted might [00:16:25] described which is not discounted might not converge it might go up to plus [00:16:28] not converge it might go up to plus infinity this discounted return will [00:16:31] infinity this discounted return will converge with an appropriate discount so [00:16:35] converge with an appropriate discount so intuitively also why is the discounted [00:16:39] intuitively also why is the discounted return intuitive is it's because time is [00:16:42] return intuitive is it's because time is always an important factor in our [00:16:44] always an important factor in our decision making people prefer cash now [00:16:46] decision making people prefer cash now than cash in 10 years right or similarly [00:16:49] than cash in 10 years right or similarly you can consider that the robot has a [00:16:52] you can consider that the robot has a limited life expectancy like it has a [00:16:54] limited life expectancy like it has a battery and loses battery every time it [00:16:56] battery and loses battery every time it moves so you want to take into account [00:16:58] moves so you want to take into account this disk [00:16:58] this disk of if I can eat chocolates close I go [00:17:03] of if I can eat chocolates close I go for it because I know that the chocolate [00:17:05] for it because I know that the chocolate is too far I might not get there because [00:17:07] is too far I might not get there because I'm losing some battery some energy for [00:17:10] I'm losing some battery some energy for example so this is the discounted return [00:17:12] example so this is the discounted return now if we take gamma equals 1 which [00:17:16] now if we take gamma equals 1 which means we have no discounts the best [00:17:18] means we have no discounts the best strategy to follow in this setting seems [00:17:20] strategy to follow in this setting seems to be to go to the to the left to go to [00:17:24] to be to go to the to the left to go to the right starting in the initial state [00:17:26] the right starting in the initial state - right and the reason is it's a simple [00:17:29] - right and the reason is it's a simple computation on one side I get plus 2 on [00:17:31] computation on one side I get plus 2 on the other side I get plus 11 what if my [00:17:34] the other side I get plus 11 what if my discount was point 1 which one will be [00:17:40] discount was point 1 which one will be better [00:17:43] yeah the left would be better directly [00:17:46] yeah the left would be better directly to plus and the reason is because we [00:17:48] to plus and the reason is because we compute in our mind we just do zero plus [00:17:51] compute in our mind we just do zero plus 0.1 times Mount one which gives us 0.1 [00:17:55] 0.1 times Mount one which gives us 0.1 plus 0.1 squared times 10 and it's less [00:17:59] plus 0.1 squared times 10 and it's less than two we know it okay so now we're [00:18:04] than two we know it okay so now we're going to assume that the discount is 0.9 [00:18:06] going to assume that the discount is 0.9 and it's a very common discount to to to [00:18:09] and it's a very common discount to to to to to use in reinforcement learning and [00:18:11] to to use in reinforcement learning and we use a discount a traitor [00:18:13] we use a discount a traitor so the general question here and it's [00:18:16] so the general question here and it's the core of reinforcement learning in [00:18:18] the core of reinforcement learning in this case of cue learning is what do we [00:18:20] this case of cue learning is what do we want to learn and this is really really [00:18:24] want to learn and this is really really think of it as a human what would you [00:18:26] think of it as a human what would you like to learn what are the numbers you [00:18:28] like to learn what are the numbers you need to have in order to be able to make [00:18:30] need to have in order to be able to make decisions really quickly assuming you [00:18:32] decisions really quickly assuming you had a lot more states than that in [00:18:34] had a lot more states than that in actions [00:18:38] any ideas of what we want to learn [00:18:45] what would help our decision-making [00:18:55] optimal action at each state yeah that's [00:18:58] optimal action at each state yeah that's exactly what we want to learn for given [00:19:00] exactly what we want to learn for given States tell me the action that I can [00:19:02] States tell me the action that I can take and for that I need to have a score [00:19:05] take and for that I need to have a score for all the actions in every state in [00:19:07] for all the actions in every state in order to store these scores we need a [00:19:08] order to store these scores we need a matrix right so this is our matrix we [00:19:11] matrix right so this is our matrix we will call it a cue table it's going to [00:19:13] will call it a cue table it's going to be of shape number of states times [00:19:16] be of shape number of states times number of actions if I have these matrix [00:19:19] number of actions if I have these matrix of scores and the scores are correct I'm [00:19:22] of scores and the scores are correct I'm in state three I can look on the third [00:19:25] in state three I can look on the third row of this matrix and look what's the [00:19:28] row of this matrix and look what's the maximum value I have is it the first one [00:19:30] maximum value I have is it the first one or the second one if it's the first one [00:19:32] or the second one if it's the first one I go to the left if it's the second one [00:19:35] I go to the left if it's the second one that is maximum I go to the right this [00:19:37] that is maximum I go to the right this is what we would like to have does that [00:19:40] is what we would like to have does that make sense [00:19:41] make sense this Q table so now let's try to build [00:19:45] this Q table so now let's try to build the Q table for this example if you have [00:19:47] the Q table for this example if you have to build it you would first think of it [00:19:50] to build it you would first think of it as a tree oh and by the way every entry [00:19:52] as a tree oh and by the way every entry of this Q table tells you how good it is [00:19:54] of this Q table tells you how good it is to take this action in that state state [00:19:59] to take this action in that state state corresponding to the row action [00:20:00] corresponding to the row action corresponding to the color so now how do [00:20:03] corresponding to the color so now how do we get there we can build a tree and [00:20:05] we get there we can build a tree and that's that's similar to what we would [00:20:07] that's that's similar to what we would do in our mind we start in s2 in s2 we [00:20:10] do in our mind we start in s2 in s2 we have two options either we go to s1 we [00:20:13] have two options either we go to s1 we get to or we go to s3 and we get zero [00:20:16] get to or we go to s3 and we get zero from s2 week from s1 we cannot go [00:20:19] from s2 week from s1 we cannot go anywhere it's a terminal state but from [00:20:20] anywhere it's a terminal state but from s3 we can go to s2 and get zero by going [00:20:26] s3 we can go to s2 and get zero by going back or we can go to s4 and get one that [00:20:30] back or we can go to s4 and get one that make sense from s4 same we can get zero [00:20:33] make sense from s4 same we can get zero by going back to s3 or we can go to s5 [00:20:36] by going back to s3 or we can go to s5 and yet plus 10 now here I just have my [00:20:39] and yet plus 10 now here I just have my immediate reward for every state what I [00:20:41] immediate reward for every state what I would like to compute is the discounted [00:20:43] would like to compute is the discounted returned for all the states because [00:20:45] returned for all the states because ultimately what should lead my [00:20:47] ultimately what should lead my decision-making in a state is if I take [00:20:49] decision-making in a state is if I take this action I get two new States what's [00:20:52] this action I get two new States what's the maximum reward I can get from there [00:20:54] the maximum reward I can get from there in the future [00:20:56] in the future not just the reward I get in that state [00:20:58] not just the reward I get in that state if I take the other action I get to [00:21:00] if I take the other action I get to another state what's the maximum reward [00:21:03] another state what's the maximum reward I could get from that state not just the [00:21:06] I could get from that state not just the immediate reward that I get from going [00:21:07] immediate reward that I get from going to that state so what we would do we can [00:21:10] to that state so what we would do we can do it together let's say we want to [00:21:12] do it together let's say we want to compute the value of of the actions from [00:21:15] compute the value of of the actions from s3 from s3 going right and left from s3 [00:21:18] s3 from s3 going right and left from s3 I can either go to s4 or s2 going to s4 [00:21:21] I can either go to s4 or s2 going to s4 I know that the immediate reward was 1 [00:21:24] I know that the immediate reward was 1 and I know that from s4 I can get +10 [00:21:27] and I know that from s4 I can get +10 this is a maximum I can get so I can [00:21:29] this is a maximum I can get so I can discount this 10 multiplied by 0.9 10 [00:21:33] discount this 10 multiplied by 0.9 10 times 0.9 Jesus 9 plus 1 which was the [00:21:36] times 0.9 Jesus 9 plus 1 which was the immediate reward is just 9 Jesus means [00:21:39] immediate reward is just 9 Jesus means 10 so 10 is the score that we give to [00:21:44] 10 so 10 is the score that we give to the action go right from state s3 now [00:21:48] the action go right from state s3 now what if we do it from one step before s2 [00:21:52] what if we do it from one step before s2 from s2 I know that I can go to s3 + 2 s [00:21:56] from s2 I know that I can go to s3 + 2 s 3 I get 0 reward so the immediate reward [00:21:58] 3 I get 0 reward so the immediate reward is 0 but I know that from s3 I can get [00:22:01] is 0 but I know that from s3 I can get 10 reward ultimately on the long-term I [00:22:03] 10 reward ultimately on the long-term I need to discount this reward from one [00:22:05] need to discount this reward from one step so I multiply this 10 by 0.9 and I [00:22:08] step so I multiply this 10 by 0.9 and I get 0 plus 0.9 times 10 which gives me 9 [00:22:11] get 0 plus 0.9 times 10 which gives me 9 so now instead two going right will give [00:22:14] so now instead two going right will give us a long-term reward of 9 make sense [00:22:18] us a long-term reward of 9 make sense and you do the same thing you can copy [00:22:22] and you do the same thing you can copy back that going from s4 to s3 will give [00:22:24] back that going from s4 to s3 will give you 0 plus the maximum you can get from [00:22:27] you 0 plus the maximum you can get from s3 which was 10 discounted by point 9 or [00:22:30] s3 which was 10 discounted by point 9 or you can do it from s2 from s2 I can go [00:22:33] you can do it from s2 from s2 I can go left and get +2 or I can go right and [00:22:36] left and get +2 or I can go right and get 9 and the immediate reward would be [00:22:39] get 9 and the immediate reward would be 9 would be 0 and I will discount the 9 [00:22:42] 9 would be 0 and I will discount the 9 by 0.9 and get 8.1 so that's the process [00:22:45] by 0.9 and get 8.1 so that's the process we would do to compute that and you see [00:22:47] we would do to compute that and you see that it's an iterative algorithm you [00:22:50] that it's an iterative algorithm you will just copy back all these values in [00:22:52] will just copy back all these values in my matrix and now if I'm in state 2 I [00:22:55] my matrix and now if I'm in state 2 I can clearly say that the best action [00:22:57] can clearly say that the best action seems to go seems to say go to the left [00:23:00] seems to go seems to say go to the left because the long-term discounted reward [00:23:04] because the long-term discounted reward is 9 while the long-term discounted [00:23:07] is 9 while the long-term discounted reward for going to the right is 2 [00:23:09] reward for going to the right is 2 and I'm done that's to learning I solved [00:23:13] and I'm done that's to learning I solved the problem I had I had a stay a problem [00:23:16] the problem I had I had a stay a problem statement [00:23:17] statement I found a matrix that tells me in every [00:23:19] I found a matrix that tells me in every state what action I should take I'm fine [00:23:22] state what action I should take I'm fine so why do we need deep learning it's a [00:23:26] so why do we need deep learning it's a question we will try to answer so the [00:23:29] question we will try to answer so the best strategy to follow with point nine [00:23:31] best strategy to follow with point nine is still right right right and the way I [00:23:34] is still right right right and the way I see it is I just look at my matrix at [00:23:36] see it is I just look at my matrix at every step and I follow always the [00:23:39] every step and I follow always the maximum of my row so from state to nine [00:23:42] maximum of my row so from state to nine is the maximum so I go right from state [00:23:45] is the maximum so I go right from state 310 is the maximum so I still go right [00:23:47] 310 is the maximum so I still go right and from state 410 is the maximum so I [00:23:49] and from state 410 is the maximum so I go right again so I take the maximum [00:23:51] go right again so I take the maximum over all the actions in a specific state [00:23:53] over all the actions in a specific state okay now one interesting thing to follow [00:23:58] okay now one interesting thing to follow is that when you do this iterative [00:24:00] is that when you do this iterative algorithm at some point it should [00:24:01] algorithm at some point it should converge and ours converged to some [00:24:04] converge and ours converged to some values that represent the discounted [00:24:05] values that represent the discounted rewards for every state and action there [00:24:10] rewards for every state and action there is an equation that this Q function [00:24:13] is an equation that this Q function follows and we know that the optimal Q [00:24:16] follows and we know that the optimal Q function followed this equation the one [00:24:18] function followed this equation the one we have here follows this equation this [00:24:21] we have here follows this equation this equation is called the bellman equation [00:24:22] equation is called the bellman equation and it has two terms one is R and one is [00:24:27] and it has two terms one is R and one is this count times the maximum of the Q [00:24:31] this count times the maximum of the Q scores over all the actions so how does [00:24:34] scores over all the actions so how does that make sense given that you know [00:24:37] that make sense given that you know state s you want to know the score of [00:24:40] state s you want to know the score of going of taking action a in this state [00:24:42] going of taking action a in this state the score should be the reward that you [00:24:44] the score should be the reward that you get by going there plus the discount [00:24:47] get by going there plus the discount times the maximum you can get in the [00:24:49] times the maximum you can get in the future that's actually what we used in [00:24:51] future that's actually what we used in the iteration does this bellman equation [00:24:53] the iteration does this bellman equation make sense okay so remember this is [00:24:58] make sense okay so remember this is going to be very important in to [00:25:00] going to be very important in to learning this bellman equation [00:25:01] learning this bellman equation it's the equation that is satisfied by [00:25:03] it's the equation that is satisfied by the optimal Q table or Q function and if [00:25:07] the optimal Q table or Q function and if you try out all these entries you will [00:25:08] you try out all these entries you will see that it follows this equation [00:25:10] see that it follows this equation so when she didn't is not optimal it's [00:25:14] so when she didn't is not optimal it's not following this equation yet we would [00:25:17] not following this equation yet we would like you to follow this equation another [00:25:20] like you to follow this equation another point of vocabulary reinforcement [00:25:21] point of vocabulary reinforcement learning is a policy policies denoted P [00:25:24] learning is a policy policies denoted P sometimes or new and sorry pi PI of s is [00:25:29] sometimes or new and sorry pi PI of s is equal to arc max over the actions of the [00:25:32] equal to arc max over the actions of the optimal Q not sure what it means it [00:25:35] optimal Q not sure what it means it means it's exactly our decision process [00:25:37] means it's exactly our decision process it's even that we're in state s we look [00:25:39] it's even that we're in state s we look at all the columns of this state s in [00:25:42] at all the columns of this state s in our Q table we take the maximum and this [00:25:44] our Q table we take the maximum and this is what PI of S is telling us it's [00:25:46] is what PI of S is telling us it's telling us this is the action you should [00:25:47] telling us this is the action you should take so PI our policy is our [00:25:49] take so PI our policy is our decision-making okay it tells us what's [00:25:54] decision-making okay it tells us what's the best strategy to follow in a given [00:25:56] the best strategy to follow in a given state any questions so far [00:26:05] ok and so I have a question for you [00:26:10] why is deep earning helpful yes that's [00:26:19] why is deep earning helpful yes that's very Z number of states is way too large [00:26:22] very Z number of states is way too large to store a table like that like if you [00:26:25] to store a table like that like if you have a small number of states and number [00:26:27] have a small number of states and number of actions then easy you can use a few [00:26:30] of actions then easy you can use a few table you can add every state look into [00:26:32] table you can add every state look into the key table it's super quick and find [00:26:34] the key table it's super quick and find out what you should do but ultimately [00:26:36] out what you should do but ultimately this Q table will get bigger and bigger [00:26:39] this Q table will get bigger and bigger depending on the application right and [00:26:42] depending on the application right and the number of states for go is 10 to the [00:26:45] the number of states for go is 10 to the power 170 approximately which means that [00:26:49] power 170 approximately which means that this matrix should have a number of rows [00:26:52] this matrix should have a number of rows equal to 10 with 170 zeros after you [00:26:57] equal to 10 with 170 zeros after you know what I mean it's very big and [00:26:59] know what I mean it's very big and number of actions is also going to be [00:27:02] number of actions is also going to be bigger and go you can place your action [00:27:05] bigger and go you can place your action everywhere on the board that is [00:27:06] everywhere on the board that is available of course okay so many way to [00:27:10] available of course okay so many way to many states and actions so we would need [00:27:13] many states and actions so we would need to come up with maybe a function [00:27:15] to come up with maybe a function approximator [00:27:16] approximator that can give us the action based on the [00:27:19] that can give us the action based on the state instead of having to store this [00:27:21] state instead of having to store this matrix that's where deep learning will [00:27:23] matrix that's where deep learning will come [00:27:24] come so just to recap this first 30 minutes [00:27:27] so just to recap this first 30 minutes in terms of vocabulary we learn what an [00:27:29] in terms of vocabulary we learn what an environment is it's the it's the general [00:27:32] environment is it's the it's the general game definition an agent is the thing [00:27:36] game definition an agent is the thing we're trying to train the decision-maker [00:27:38] we're trying to train the decision-maker a state an action reward Total Return a [00:27:42] a state an action reward Total Return a discount factor the cue table which is [00:27:44] discount factor the cue table which is the matrix of entries representing how [00:27:46] the matrix of entries representing how good it is to take action a in state s a [00:27:49] good it is to take action a in state s a policy which is our decision making [00:27:51] policy which is our decision making function telling us what's the best [00:27:53] function telling us what's the best strategy to apply in a state and Batman [00:27:55] strategy to apply in a state and Batman equation which is satisfied by the [00:27:57] equation which is satisfied by the optimal cue table now we will tweak this [00:28:00] optimal cue table now we will tweak this cue table into a cue function and that's [00:28:03] cue table into a cue function and that's where we shift from cue learning to deep [00:28:06] where we shift from cue learning to deep cue learning so find a cue function to [00:28:09] cue learning so find a cue function to replace the few table ok so this is the [00:28:13] replace the few table ok so this is the setting we have our problem statement we [00:28:15] setting we have our problem statement we have our cue table we want to change it [00:28:17] have our cue table we want to change it into a function approximator that will [00:28:20] into a function approximator that will be our neural network does that make [00:28:24] be our neural network does that make sense how deep learning comes into [00:28:26] sense how deep learning comes into reinforcement learning here so now we [00:28:29] reinforcement learning here so now we take a state as input for word [00:28:31] take a state as input for word propagated in the deep network and get [00:28:34] propagated in the deep network and get an output which is an action an action [00:28:38] an output which is an action an action score for all the actions it makes sense [00:28:42] score for all the actions it makes sense to have an output layer that is the size [00:28:45] to have an output layer that is the size of the number of actions because we [00:28:48] of the number of actions because we don't want to we don't want to give an [00:28:50] don't want to we don't want to give an action as input and the state as input [00:28:52] action as input and the state as input and get the score for this action taken [00:28:55] and get the score for this action taken in this state instead we can be much [00:28:57] in this state instead we can be much quicker you can just give the state as [00:28:59] quicker you can just give the state as input get all the distribution of scores [00:29:02] input get all the distribution of scores over the output and we just select the [00:29:05] over the output and we just select the maximum of this vector which will tell [00:29:07] maximum of this vector which will tell us which action is best so if I wait for [00:29:10] us which action is best so if I wait for in states to let's say here we're in [00:29:15] in states to let's say here we're in state two and before propagate state 2 [00:29:17] state two and before propagate state 2 we get two values which are the scores [00:29:20] we get two values which are the scores of going left and right from state two [00:29:23] of going left and right from state two we can select the maximum of those and [00:29:25] we can select the maximum of those and it will give us our action [00:29:27] it will give us our action the question is how to train this [00:29:30] the question is how to train this network we know how to train it we've [00:29:33] network we know how to train it we've been learning it for nine weeks compute [00:29:35] been learning it for nine weeks compute the loss back propagate can you guys [00:29:38] the loss back propagate can you guys think of some issues that that make this [00:29:41] think of some issues that that make this setting different from a classic [00:29:43] setting different from a classic supervised learning setting the reward [00:29:52] supervised learning setting the reward changes dynamically so the reward [00:29:54] changes dynamically so the reward doesn't change the reward is set you [00:29:56] doesn't change the reward is set you define it at the beginning it doesn't [00:29:57] define it at the beginning it doesn't change that m'kay but I think what you [00:29:58] change that m'kay but I think what you meant is that the Q score is changed [00:30:00] meant is that the Q score is changed dynamically that's true the Q scores [00:30:03] dynamically that's true the Q scores change dynamically but that's that's [00:30:06] change dynamically but that's that's probably okay because our network [00:30:08] probably okay because our network changed that our network is now the Q [00:30:09] changed that our network is now the Q score so when we update the parameters [00:30:11] score so when we update the parameters of the network it updates the Q scores [00:30:13] of the network it updates the Q scores what's what's another issue that we [00:30:16] what's what's another issue that we might have No Labels [00:30:21] might have No Labels remember in supervised learning you need [00:30:23] remember in supervised learning you need labels to train your network what are [00:30:25] labels to train your network what are the labels here and don't say compute [00:30:34] the labels here and don't say compute the Q table use them as labels it's not [00:30:37] the Q table use them as labels it's not gonna work okay [00:30:41] so that's the main issue that makes this [00:30:43] so that's the main issue that makes this problem very different from classic [00:30:44] problem very different from classic supervised learning so let's see how how [00:30:47] supervised learning so let's see how how deep burning can be tweaked a little and [00:30:49] deep burning can be tweaked a little and we want you to see these techniques [00:30:51] we want you to see these techniques because they're helpful when you read a [00:30:53] because they're helpful when you read a variety of research papers we have our [00:30:56] variety of research papers we have our network given a state gives us two [00:30:59] network given a state gives us two scores that represent actions for going [00:31:01] scores that represent actions for going left and right from these states the [00:31:02] left and right from these states the last function that will define is it a [00:31:05] last function that will define is it a classification problem or a regression [00:31:06] classification problem or a regression problem regression problem because the Q [00:31:14] problem regression problem because the Q score doesn't have to be a probably B to [00:31:17] score doesn't have to be a probably B to the 0 & 1 it's just a score that you [00:31:19] the 0 & 1 it's just a score that you want to give and that should look that [00:31:21] want to give and that should look that should mimic the long term discounted [00:31:23] should mimic the long term discounted reward so in fact the last function we [00:31:25] reward so in fact the last function we can use is is the L to last function y [00:31:30] can use is is the L to last function y minus the Q score squared so let's say [00:31:33] minus the Q score squared so let's say we do it for the Q going to the right [00:31:37] we do it for the Q going to the right the question is what is why what is the [00:31:40] the question is what is why what is the target for this queue and remember what [00:31:43] target for this queue and remember what I copied on the top of this slide is the [00:31:45] I copied on the top of this slide is the bellman equation we know that the [00:31:47] bellman equation we know that the optimal queue should follow this [00:31:49] optimal queue should follow this equation we know it the problem is that [00:31:54] equation we know it the problem is that this equation depends on its own Q you [00:31:57] this equation depends on its own Q you know like you have to on both sides of [00:31:58] know like you have to on both sides of the equation it means if you set the [00:32:00] the equation it means if you set the label to be R plus gamma times max of Q [00:32:04] label to be R plus gamma times max of Q stars then when you will back propagate [00:32:07] stars then when you will back propagate you will also have a derivative here let [00:32:10] you will also have a derivative here let me let me go into the details let's [00:32:13] me let me go into the details let's define the target value let's assume [00:32:14] define the target value let's assume that going left is better than going [00:32:18] that going left is better than going right at this point in time so we [00:32:20] right at this point in time so we initialize the network randomly we [00:32:22] initialize the network randomly we forward propagate state to in the [00:32:23] forward propagate state to in the network and the Q score for left is more [00:32:26] network and the Q score for left is more than the Q score for right so that's the [00:32:29] than the Q score for right so that's the action we will take at this point is [00:32:30] action we will take at this point is going left let's define our target Y as [00:32:35] going left let's define our target Y as the reward you get when you go left [00:32:38] the reward you get when you go left immediate plus gamma times the maximum [00:32:42] immediate plus gamma times the maximum of all the Q values you get from the [00:32:48] of all the Q values you get from the next step so let me spend a little more [00:32:53] next step so let me spend a little more time on that because it's a little [00:32:54] time on that because it's a little complicated [00:32:54] complicated I mean s I move to s next using a move [00:32:59] I mean s I move to s next using a move on the left I get immediate reward R and [00:33:02] on the left I get immediate reward R and I also get a new state s prime s next I [00:33:06] I also get a new state s prime s next I can for propagate this state in the [00:33:09] can for propagate this state in the network and you understand what is the [00:33:11] network and you understand what is the maximum I can get from this state take [00:33:13] maximum I can get from this state take the maximum value and plug it in here so [00:33:18] the maximum value and plug it in here so this is hopefully what's the optimal Q [00:33:22] this is hopefully what's the optimal Q should follow it's a proxy to a good [00:33:25] should follow it's a proxy to a good label it means we know that the bellman [00:33:28] label it means we know that the bellman equation tells us the best q satisfies [00:33:31] equation tells us the best q satisfies this equation when in fact this equation [00:33:34] this equation when in fact this equation is not true yet because the true [00:33:36] is not true yet because the true equation will have Q star here not q q [00:33:40] equation will have Q star here not q q star which is the optimal q what we hope [00:33:42] star which is the optimal q what we hope is that if we use this proxy as our [00:33:45] is that if we use this proxy as our label and we learn the difference [00:33:47] label and we learn the difference between where we are now and this proxy [00:33:50] between where we are now and this proxy we can then update the proxy get closer [00:33:52] we can then update the proxy get closer to the optimality train again update the [00:33:56] to the optimality train again update the proxy get closer to optimality train [00:33:58] proxy get closer to optimality train again and so on our only hope is that [00:34:00] again and so on our only hope is that this will converge so does it make sense [00:34:04] this will converge so does it make sense how this is different from the burning [00:34:05] how this is different from the burning the labels are moving they're not static [00:34:09] the labels are moving they're not static labels we define a label to be a best [00:34:14] labels we define a label to be a best guess of what would be the best few [00:34:16] guess of what would be the best few function we have then we compute the [00:34:19] function we have then we compute the loss of where the Q function is right [00:34:20] loss of where the Q function is right now compared to this we back propagate [00:34:23] now compared to this we back propagate so that the Q function gets closer to [00:34:25] so that the Q function gets closer to our best guess then now that we have a [00:34:27] our best guess then now that we have a better q function we can have a better [00:34:29] better q function we can have a better guess so we make a better guess and we [00:34:33] guess so we make a better guess and we fix this guess and now we compute the [00:34:35] fix this guess and now we compute the difference between this Q function that [00:34:37] difference between this Q function that we have and our best guess we back [00:34:39] we have and our best guess we back propagate up we get to our best guess we [00:34:43] propagate up we get to our best guess we can update our best guess again and we [00:34:45] can update our best guess again and we hope that doing that iteratively will [00:34:47] hope that doing that iteratively will end with the convergence and a q [00:34:50] end with the convergence and a q function that will be very close to [00:34:52] function that will be very close to satisfy the bellman equation the optimal [00:34:54] satisfy the bellman equation the optimal penguin equation does it make sense this [00:34:57] penguin equation does it make sense this is the most complicated part of Q [00:34:59] is the most complicated part of Q learning yeah we generate the output of [00:35:07] learning yeah we generate the output of the network we get two Q function we [00:35:10] the network we get two Q function we compare it to the q the best q function [00:35:12] compare it to the q the best q function that we think is the one that satisfies [00:35:17] that we think is the one that satisfies the bellman equation we don't but we [00:35:22] the bellman equation we don't but we guess it based on the Q we have so [00:35:26] guess it based on the Q we have so basically when you have Q you can [00:35:27] basically when you have Q you can compute this Batman equation and it will [00:35:29] compute this Batman equation and it will give you some values these values are [00:35:32] give you some values these values are probably closer to where you want to get [00:35:34] probably closer to where you want to get to where from where you are now where [00:35:35] to where from where you are now where your now is is further from this [00:35:37] your now is is further from this optimality and you want to reduce this [00:35:39] optimality and you want to reduce this gap by by like to close the gap you back [00:35:42] gap by by like to close the gap you back propagate yes so the question is is [00:35:49] propagate yes so the question is is there possibility for this to diverge so [00:35:51] there possibility for this to diverge so this is a broader discussion that would [00:35:53] this is a broader discussion that would take a full lecture to prove so I put a [00:35:55] take a full lecture to prove so I put a paper here from for Francisco Melo which [00:35:58] paper here from for Francisco Melo which proves the convergence of this algorithm [00:36:00] proves the convergence of this algorithm so it converges and in fact [00:36:03] so it converges and in fact it converges because we're using a lot [00:36:05] it converges because we're using a lot of tips and tricks that we will see [00:36:06] of tips and tricks that we will see later but if you want to see the math [00:36:08] later but if you want to see the math behind it and it's a it's a full lecture [00:36:11] behind it and it's a it's a full lecture of proof I invite you to look at this [00:36:14] of proof I invite you to look at this simple proof for convergence of [00:36:16] simple proof for convergence of development equation ok ok so this is [00:36:21] development equation ok ok so this is the case where a Left score is higher [00:36:24] the case where a Left score is higher than right score and we have two terms [00:36:26] than right score and we have two terms in our targets immediate reward for [00:36:28] in our targets immediate reward for taking action left and also discounted [00:36:30] taking action left and also discounted maximum future reward when you are in [00:36:32] maximum future reward when you are in state it's as next ok the tricky part is [00:36:41] state it's as next ok the tricky part is that let's say we we compute that we can [00:36:44] that let's say we we compute that we can do it we have everything we have [00:36:45] do it we have everything we have everything to compute our target we have [00:36:48] everything to compute our target we have R which is defined by the by the human [00:36:50] R which is defined by the by the human at the beginning and we can also get [00:36:53] at the beginning and we can also get this number [00:36:53] this number because we know that if we take action [00:36:56] because we know that if we take action left we can then get s next and we for [00:37:00] left we can then get s next and we for propagate ethnics in the network we take [00:37:02] propagate ethnics in the network we take the maximum output and it's this so we [00:37:04] the maximum output and it's this so we have everything in this in this equation [00:37:06] have everything in this in this equation the problem now is if I plug this and my [00:37:10] the problem now is if I plug this and my Q score in my loss function and I ask [00:37:12] Q score in my loss function and I ask you to back propagate back propagation [00:37:15] you to back propagate back propagation is what W equals W minus alpha times the [00:37:18] is what W equals W minus alpha times the derivative of the last function with [00:37:19] derivative of the last function with respect to W the parameters of the [00:37:21] respect to W the parameters of the network which term will have a nonzero [00:37:25] network which term will have a nonzero value obviously the second term Q of s [00:37:28] value obviously the second term Q of s go to the left will have a nonzero value [00:37:30] go to the left will have a nonzero value because it depends on the parameters of [00:37:32] because it depends on the parameters of the network W but Y will also have a [00:37:37] the network W but Y will also have a nonzero value because you have Q here so [00:37:42] nonzero value because you have Q here so how do you handle that you actually get [00:37:44] how do you handle that you actually get a feedback loop in this back propagation [00:37:47] a feedback loop in this back propagation that makes the network unstable what we [00:37:51] that makes the network unstable what we do is that we consider this fixed we [00:37:54] do is that we consider this fixed we will consider this Q fixed the Q that is [00:37:56] will consider this Q fixed the Q that is our target is going to be fixed for many [00:37:58] our target is going to be fixed for many iteration let's say a million or a [00:38:01] iteration let's say a million or a hundred thousand iteration until we get [00:38:03] hundred thousand iteration until we get close to there and our gradient is small [00:38:05] close to there and our gradient is small then we will update it and we'll fix it [00:38:08] then we will update it and we'll fix it so we actually have two networks in [00:38:09] so we actually have two networks in parallel one that is fixed and one that [00:38:11] parallel one that is fixed and one that is not fixed [00:38:13] is not fixed okay and the second case is similar if [00:38:16] okay and the second case is similar if the cue score to go on the right was [00:38:18] the cue score to go on the right was more than the Q score to go on the left [00:38:20] more than the Q score to go on the left we would define our target as immediate [00:38:22] we would define our target as immediate reward of going to the right plus gamma [00:38:25] reward of going to the right plus gamma times the maximum Q score we get if [00:38:28] times the maximum Q score we get if we're in the states that we in the next [00:38:31] we're in the states that we in the next state and take the best action does this [00:38:35] state and take the best action does this make sense is the most complicated part [00:38:37] make sense is the most complicated part of Q learning [00:38:38] of Q learning this is the hard part to understand so [00:38:40] this is the hard part to understand so immediate reward to go to the right and [00:38:42] immediate reward to go to the right and discounted maximum feature reward when [00:38:44] discounted maximum feature reward when you're in state s next going to draw it [00:38:46] you're in state s next going to draw it so this is hold fix for backdrop so no [00:38:52] so this is hold fix for backdrop so no derivative if we do that then no problem [00:38:54] derivative if we do that then no problem Y is just a number [00:38:56] Y is just a number we come back to our original supervised [00:38:58] we come back to our original supervised learning setting y is the number and we [00:39:00] learning setting y is the number and we compute the loss and we back propagate [00:39:02] compute the loss and we back propagate no difference okay [00:39:06] no difference okay so compute DL over DW and update W using [00:39:10] so compute DL over DW and update W using stochastic gradient descent method [00:39:12] stochastic gradient descent method rmsprop Adam whatever you guys want so [00:39:18] rmsprop Adam whatever you guys want so let's go over this this full dqn deep Q [00:39:22] let's go over this this full dqn deep Q network implementation and this slide is [00:39:25] network implementation and this slide is a pseudocode to help you understand how [00:39:28] a pseudocode to help you understand how this entire algorithm work we will [00:39:30] this entire algorithm work we will actually plug in many methods in this in [00:39:32] actually plug in many methods in this in this pseudocode so please focus right [00:39:34] this pseudocode so please focus right now and if you understand this you [00:39:36] now and if you understand this you understand the entire rest of the [00:39:37] understand the entire rest of the lecture we initialize our two network [00:39:40] lecture we initialize our two network parameters just as we initialize the [00:39:42] parameters just as we initialize the network in deep learning we loop over [00:39:44] network in deep learning we loop over episode so let's define an episode to be [00:39:46] episode so let's define an episode to be one game like going from start to end to [00:39:49] one game like going from start to end to a terminal state this is one episode we [00:39:52] a terminal state this is one episode we can also define episodes sometimes to be [00:39:54] can also define episodes sometimes to be many states like breakout which is the [00:39:57] many states like breakout which is the game with the paddle usually is 20 [00:40:00] game with the paddle usually is 20 points the first player to get 20 points [00:40:02] points the first player to get 20 points finishes the game so episode will be 20 [00:40:04] finishes the game so episode will be 20 points once you're looking over episode [00:40:09] points once you're looking over episode starts from an initial state s in our [00:40:11] starts from an initial state s in our case it's only one initial state which [00:40:14] case it's only one initial state which is state two and loop over time steps [00:40:17] is state two and loop over time steps for propagate S state two in the Q [00:40:20] for propagate S state two in the Q network [00:40:22] network execute action a which has the maximum Q [00:40:24] execute action a which has the maximum Q score observe a immediate reward R and [00:40:29] score observe a immediate reward R and the next step s prime compute target Y [00:40:34] the next step s prime compute target Y and to compute Y we know that we need to [00:40:36] and to compute Y we know that we need to take s prime for propagated in the [00:40:39] take s prime for propagated in the network again and then compute the last [00:40:42] network again and then compute the last function update the parameters will [00:40:44] function update the parameters will gradually set does this loop make sense [00:40:47] gradually set does this loop make sense it's very close to what we do in general [00:40:49] it's very close to what we do in general the only difference would be this part [00:40:52] the only difference would be this part like we compute target Y using a double [00:40:55] like we compute target Y using a double for propagation so we for a propagation [00:40:57] for propagation so we for a propagation before propagate two times in each loop [00:41:01] do you guys have any questions on on [00:41:04] do you guys have any questions on on this pseudocode [00:41:12] okay so we will now see a concrete [00:41:17] okay so we will now see a concrete application of a Dipsy network so this [00:41:20] application of a Dipsy network so this was the theoretical partner we're going [00:41:21] was the theoretical partner we're going to the practical part which is going be [00:41:23] to the practical part which is going be to be more fun so let's look at this [00:41:25] to be more fun so let's look at this game it's called breakout the goal when [00:41:29] game it's called breakout the goal when you play breakout is to destroy all the [00:41:31] you play breakout is to destroy all the bricks without having the ball pass the [00:41:34] bricks without having the ball pass the line on the bottle so we have a paddle [00:41:37] line on the bottle so we have a paddle and our decisions can be idle stay stay [00:41:40] and our decisions can be idle stay stay where you are move the paddle to the [00:41:42] where you are move the paddle to the right or move the paddle to the left [00:41:43] right or move the paddle to the left right and this demo and you have the [00:41:53] right and this demo and you have the credits on the bottom of the slide shows [00:41:56] credits on the bottom of the slide shows that after training breakout using Q [00:42:00] that after training breakout using Q learning they get a super intelligent [00:42:03] learning they get a super intelligent agents which figures out the trick to [00:42:07] agents which figures out the trick to finish the game very quickly so actually [00:42:09] finish the game very quickly so actually even like good players you don't know [00:42:13] even like good players you don't know this trick professional players no [00:42:15] this trick professional players no district but in breakout you can [00:42:18] district but in breakout you can actually try to dig a tunnel to get on [00:42:20] actually try to dig a tunnel to get on the other side of the bricks and then [00:42:22] the other side of the bricks and then you will destroy all the bricks super [00:42:23] you will destroy all the bricks super quickly from top to bottom instead of [00:42:26] quickly from top to bottom instead of bottom up what's super interesting is [00:42:28] bottom up what's super interesting is that the network figured out this on its [00:42:30] that the network figured out this on its own without human supervision and this [00:42:34] own without human supervision and this is the kind of thing we want to remember [00:42:35] is the kind of thing we want to remember if we were to use input the go board and [00:42:39] if we were to use input the go board and output professional players we will not [00:42:41] output professional players we will not figure out that type of stuff most of [00:42:44] figure out that type of stuff most of the time so my question is what's the [00:42:48] the time so my question is what's the input of the Q network in this setting [00:42:50] input of the Q network in this setting our goal is to destroy all the bricks so [00:42:52] our goal is to destroy all the bricks so play break out what should be the input [00:43:10] try something that position position of [00:43:18] try something that position position of bricks position of the paddle function [00:43:21] bricks position of the paddle function of the bricks what else ball position [00:43:23] of the bricks what else ball position okay yeah I agree [00:43:24] okay yeah I agree so this is what we would call a future [00:43:27] so this is what we would call a future representation it means when you're in [00:43:30] representation it means when you're in an environment you can extract some [00:43:31] an environment you can extract some features right and these are examples of [00:43:34] features right and these are examples of features give me the position of the [00:43:35] features give me the position of the ball is one feature give me the position [00:43:37] ball is one feature give me the position of the bricks [00:43:38] of the bricks another feature give me the position of [00:43:39] another feature give me the position of the paddle another feature which are [00:43:41] the paddle another feature which are good features for this game but if you [00:43:44] good features for this game but if you want to get the entire information you'd [00:43:45] want to get the entire information you'd better do something else yeah the pixels [00:43:56] better do something else yeah the pixels you don't want any human supervision you [00:43:58] you don't want any human supervision you don't want to put features you just okay [00:44:00] don't want to put features you just okay take the pixels play the game you can [00:44:03] take the pixels play the game you can control the paddle take the pixel so [00:44:06] control the paddle take the pixel so yeah this is a good input to the cue [00:44:07] yeah this is a good input to the cue Network what's the output I said it [00:44:10] Network what's the output I said it earlier probably the output of the [00:44:12] earlier probably the output of the network will be three key values [00:44:14] network will be three key values representing the action going left going [00:44:18] representing the action going left going right and staying idle in a specific [00:44:20] right and staying idle in a specific state that is the input of the network [00:44:21] state that is the input of the network so give it a pixel image we want to [00:44:25] so give it a pixel image we want to predict choose scores for the three [00:44:28] predict choose scores for the three possible actions now what's the issue [00:44:31] possible actions now what's the issue with that you think that would work or [00:44:33] with that you think that would work or not [00:44:41] can someone think of something going [00:44:43] can someone think of something going wrong here looking at the inputs [00:45:00] okay I'm gonna help you [00:45:03] okay I'm gonna help you if I give ya you won't try oh yeah good [00:45:09] if I give ya you won't try oh yeah good point based on this image you cannot [00:45:11] point based on this image you cannot know if the ball is going up or down so [00:45:14] know if the ball is going up or down so actually it's super hard because the [00:45:15] actually it's super hard because the action you take highly depends on if the [00:45:17] action you take highly depends on if the goal is going up or down right [00:45:19] goal is going up or down right if the ball is going down and even if [00:45:23] if the ball is going down and even if the ball is going down you don't even [00:45:24] the ball is going down you don't even know which direct direction is going [00:45:26] know which direct direction is going down so there's a problem here [00:45:28] down so there's a problem here definitely there is not enough [00:45:29] definitely there is not enough information to make a decision on the [00:45:31] information to make a decision on the action to take and if it's hard for us [00:45:33] action to take and if it's hard for us it's going to be hard for the network so [00:45:36] it's going to be hard for the network so what's a hack - to prevent that it's to [00:45:40] what's a hack - to prevent that it's to take successive frames so instead of one [00:45:43] take successive frames so instead of one frame we can take four frames successive [00:45:46] frame we can take four frames successive frames and here the same setting as we [00:45:48] frames and here the same setting as we had before but we see that the ball is [00:45:50] had before but we see that the ball is going up we seed which direction is [00:45:53] going up we seed which direction is going up and we know what action we [00:45:55] going up and we know what action we should take because we know the slope of [00:45:57] should take because we know the slope of the ball and also also if it's going up [00:46:00] the ball and also also if it's going up or down that make sense okay so this is [00:46:05] or down that make sense okay so this is called a pre-processing given a state [00:46:07] called a pre-processing given a state computer function Phi of s that gives [00:46:11] computer function Phi of s that gives you the history of this state which is [00:46:14] you the history of this state which is the four sequence of four last frames [00:46:16] the four sequence of four last frames what other pre-processing can we do and [00:46:22] what other pre-processing can we do and this is something I want you to be quick [00:46:24] this is something I want you to be quick like we we learnt it together in deep [00:46:26] like we we learnt it together in deep learning input pre-processing remember [00:46:32] learning input pre-processing remember the second lecture where the question [00:46:35] the second lecture where the question was what resolution should we use [00:46:38] was what resolution should we use remember you have a cat recon mission [00:46:41] remember you have a cat recon mission what's resolution would you want to use [00:46:44] what's resolution would you want to use here same thing if we can reduce the [00:46:48] here same thing if we can reduce the size of the input let's do it if we [00:46:51] size of the input let's do it if we don't need all that information let's do [00:46:53] don't need all that information let's do it for example do you think the colors [00:46:55] it for example do you think the colors are important very minor I don't think [00:46:59] are important very minor I don't think they're important [00:47:00] they're important so maybe we can grayscale everything [00:47:02] so maybe we can grayscale everything that removes three chat that converse [00:47:05] that removes three chat that converse three channels into one channel which is [00:47:07] three channels into one channel which is amazing in terms of computation what [00:47:10] amazing in terms of computation what else I think we can crop a lot of [00:47:12] else I think we can crop a lot of this like maybe there's a line here we [00:47:15] this like maybe there's a line here we don't need to make any decision we don't [00:47:18] don't need to make any decision we don't need this course [00:47:19] need this course maybe so actually there are some games [00:47:22] maybe so actually there are some games where the score is important for a [00:47:23] where the score is important for a decision making an example is football [00:47:26] decision making an example is football like or soccer when you're when you're [00:47:29] like or soccer when you're when you're winning 1-0 you you'd better if you're [00:47:32] winning 1-0 you you'd better if you're playing against the strong team defend [00:47:34] playing against the strong team defend like get back and defend to keep this [00:47:36] like get back and defend to keep this one zero so the score is actually [00:47:38] one zero so the score is actually important in the decision-making process [00:47:39] important in the decision-making process and in fact their famous coach in [00:47:43] and in fact their famous coach in football which have this technique [00:47:46] football which have this technique called park the bus where you just put [00:47:49] called park the bus where you just put all your team in front of the goal once [00:47:50] all your team in front of the goal once you have scored a goal so this is an [00:47:52] you have scored a goal so this is an example so here there is no park the bus [00:47:55] example so here there is no park the bus but we can definitely get rid of the [00:47:58] but we can definitely get rid of the score which remove some pixels and [00:48:00] score which remove some pixels and reduces the number of computations and [00:48:05] reduces the number of computations and we can reduce to grayscale one important [00:48:07] we can reduce to grayscale one important thing to be careful about when you [00:48:09] thing to be careful about when you reduce your grayscale is that grayscale [00:48:11] reduce your grayscale is that grayscale is a dimensionality reduction technique [00:48:13] is a dimensionality reduction technique it means you you lose information but [00:48:15] it means you you lose information but you know if you have three channels and [00:48:17] you know if you have three channels and you reduce everything in one channel [00:48:19] you reduce everything in one channel sometimes you would have different color [00:48:21] sometimes you would have different color pixels which will end up with the same [00:48:22] pixels which will end up with the same grayscale value depending on the grade [00:48:24] grayscale value depending on the grade scale that we use and it's been seen [00:48:26] scale that we use and it's been seen that you lose some information sometimes [00:48:28] that you lose some information sometimes so let's say the ball and some bricks [00:48:31] so let's say the ball and some bricks have the same grayscale value then you [00:48:34] have the same grayscale value then you would not differentiate them or let's [00:48:37] would not differentiate them or let's say the paddle and the background have [00:48:38] say the paddle and the background have the same grayscale value then you would [00:48:40] the same grayscale value then you would not differentiate them so you have to be [00:48:41] not differentiate them so you have to be careful of that type of stuff and [00:48:43] careful of that type of stuff and there's other methods that do greyscale [00:48:45] there's other methods that do greyscale in not other ways like luminance so we [00:48:48] in not other ways like luminance so we have our Phi of s which is this which is [00:48:52] have our Phi of s which is this which is this input to the key network and the [00:48:55] this input to the key network and the dip to network architecture is going to [00:48:56] dip to network architecture is going to be a convolutional neural network [00:48:58] be a convolutional neural network because we're working with images so we [00:49:00] because we're working with images so we for propagate dots this is the [00:49:01] for propagate dots this is the architecture from min cavusoglu [00:49:04] architecture from min cavusoglu silver at all from the pine cone value [00:49:07] silver at all from the pine cone value can't really control uu to fully [00:49:09] can't really control uu to fully connected layers and you get your Q [00:49:11] connected layers and you get your Q scores and we get back to our training [00:49:17] scores and we get back to our training loop so what do we need to change in our [00:49:20] loop so what do we need to change in our training loop here is we said that one [00:49:23] training loop here is we said that one frame is not enough so we pre process [00:49:24] frame is not enough so we pre process all the frames so [00:49:26] all the frames so initial States is converted to 5s the [00:49:29] initial States is converted to 5s the fault propagated state is 5s and so on [00:49:32] fault propagated state is 5s and so on so everywhere we had s or s Prime we [00:49:35] so everywhere we had s or s Prime we convert to Phi of s or Phi of s prime [00:49:37] convert to Phi of s or Phi of s prime which gives us the history now there are [00:49:40] which gives us the history now there are a lot more techniques that we can plug [00:49:42] a lot more techniques that we can plug in here and we will see three more one [00:49:44] in here and we will see three more one is keeping track of the terminal state [00:49:46] is keeping track of the terminal state in this loop we should keep track of the [00:49:48] in this loop we should keep track of the terminal state because we said if we [00:49:49] terminal state because we said if we reach a terminal state we want to end [00:49:51] reach a terminal state we want to end the loop break the loop another reason [00:49:53] the loop break the loop another reason is because the Y function so basically [00:49:57] is because the Y function so basically we have to add create a boolean to [00:49:59] we have to add create a boolean to detect the terminal States before [00:50:01] detect the terminal States before looping through the time steps and [00:50:02] looping through the time steps and inside the loop we want to check if the [00:50:06] inside the loop we want to check if the new s Prime we're going to is a terminal [00:50:09] new s Prime we're going to is a terminal state if it's a terminal state then I [00:50:11] state if it's a terminal state then I can stop this loop and go back play [00:50:14] can stop this loop and go back play another episode so play another start at [00:50:17] another episode so play another start at another starting state and continue my [00:50:19] another starting state and continue my game now this Y target that we compute [00:50:24] game now this Y target that we compute is different if we're in a terminal [00:50:26] is different if we're in a terminal state or not because if we're a terminal [00:50:30] state or not because if we're a terminal state there is no reason to have a [00:50:32] state there is no reason to have a discounted long-term reward there's [00:50:34] discounted long-term reward there's nothing behind that terminal state so if [00:50:36] nothing behind that terminal state so if we're in terminal state we just set it [00:50:37] we're in terminal state we just set it to the immediate reward and we break if [00:50:40] to the immediate reward and we break if we're not in a terminal state then we [00:50:41] we're not in a terminal state then we would add this discounted future reward [00:50:45] would add this discounted future reward any questions on that yep another issue [00:50:54] any questions on that yep another issue that we're seeing this and which makes [00:50:56] that we're seeing this and which makes this reinforcement learning setting [00:50:58] this reinforcement learning setting super different from the classic [00:51:00] super different from the classic supervised learning setting is that we [00:51:02] supervised learning setting is that we only train on what we explore it means [00:51:06] only train on what we explore it means I'm starting in a state s I compute I [00:51:10] I'm starting in a state s I compute I forward propagate this Phi of s in my [00:51:12] forward propagate this Phi of s in my network I get my vector of Q values I [00:51:16] network I get my vector of Q values I select the best Q value the largest I [00:51:19] select the best Q value the largest I get a new state because I can move now [00:51:22] get a new state because I can move now from state s to s prime so I have a [00:51:24] from state s to s prime so I have a transition from s take action a get s [00:51:27] transition from s take action a get s prime or Phi of s take action a get Phi [00:51:30] prime or Phi of s take action a get Phi of s prime now this is what I will use [00:51:35] of s prime now this is what I will use to train my network I can forward [00:51:37] to train my network I can forward propagate Phi of s prime again [00:51:40] propagate Phi of s prime again the network and get my why targets [00:51:43] the network and get my why targets compare my why to my queue and then back [00:51:47] compare my why to my queue and then back propagate the issue is I may never [00:51:49] propagate the issue is I may never explore this state transition again [00:51:51] explore this state transition again maybe I will never get there anymore [00:51:54] maybe I will never get there anymore it's super different from what we do in [00:51:56] it's super different from what we do in supervised learning where you have a [00:51:57] supervised learning where you have a data set and your data set can be used [00:52:00] data set and your data set can be used many times with batch gradient descent [00:52:02] many times with batch gradient descent or with any gradient descent algorithm [00:52:04] or with any gradient descent algorithm one epoch you see all the data points so [00:52:08] one epoch you see all the data points so if you do to epochs you see every day [00:52:09] if you do to epochs you see every day two points two times if you do ten [00:52:11] two points two times if you do ten epochs you see every day to prostrate [00:52:12] epochs you see every day to prostrate three times ten times so it means that [00:52:15] three times ten times so it means that every data point can be used several [00:52:16] every data point can be used several time to train your algorithm in classic [00:52:19] time to train your algorithm in classic deep learning that we've seen together [00:52:20] deep learning that we've seen together in this case it doesn't seem possible [00:52:22] in this case it doesn't seem possible because we only train when we explore [00:52:25] because we only train when we explore and we might never get back there [00:52:27] and we might never get back there especially because the training will be [00:52:29] especially because the training will be influenced by where we go so maybe there [00:52:31] influenced by where we go so maybe there are some places where we will never go [00:52:33] are some places where we will never go because why we train and why we learn it [00:52:35] because why we train and why we learn it will it will kind of direct our decision [00:52:38] will it will kind of direct our decision process and we will never train on some [00:52:39] process and we will never train on some parts of the game so this is why we have [00:52:41] parts of the game so this is why we have other techniques to keep this training [00:52:43] other techniques to keep this training stable one is called experience replay [00:52:45] stable one is called experience replay so as I said here is what we are [00:52:47] so as I said here is what we are currently doing we have Phi of s for [00:52:50] currently doing we have Phi of s for propagates get a from taking action a we [00:52:54] propagates get a from taking action a we observe an immediate reward R and a new [00:52:57] observe an immediate reward R and a new state Phi of s Prime then from Phi of s [00:52:59] state Phi of s Prime then from Phi of s Prime we can take a new action a prime [00:53:02] Prime we can take a new action a prime observer a new reward R prime and the [00:53:06] observer a new reward R prime and the new state Phi of s prime prime and so on [00:53:10] new state Phi of s prime prime and so on and each of these is called a state [00:53:13] and each of these is called a state transition and can be used to Train this [00:53:17] transition and can be used to Train this is one experience leads to one iteration [00:53:19] is one experience leads to one iteration of gradient descent a 1 e 2 e 3 [00:53:24] of gradient descent a 1 e 2 e 3 experience one experience to experience [00:53:26] experience one experience to experience tree and the training will be trained on [00:53:28] tree and the training will be trained on experience one then trained on [00:53:30] experience one then trained on experience two then trained our [00:53:31] experience two then trained our experience tree what we're doing with [00:53:33] experience tree what we're doing with experience replay is the following we [00:53:36] experience replay is the following we will observe experience one because we [00:53:39] will observe experience one because we start in a site we take an action we see [00:53:41] start in a site we take an action we see another state and earn a reward and this [00:53:43] another state and earn a reward and this is called experience one we will create [00:53:45] is called experience one we will create a replay memory you can think of it as a [00:53:49] a replay memory you can think of it as a data structure in computer science and [00:53:51] data structure in computer science and you will place this experience one topo [00:53:53] you will place this experience one topo in the [00:53:53] in the your play memory then from there we will [00:53:56] your play memory then from there we will experience experience - we will put [00:53:59] experience experience - we will put experience - in the replay memory same [00:54:01] experience - in the replay memory same with experience 3 put it in a replay [00:54:02] with experience 3 put it in a replay memory and so on [00:54:04] memory and so on now during training what we will do is [00:54:07] now during training what we will do is we will first train on experience 1 [00:54:09] we will first train on experience 1 because it's the only experience we have [00:54:11] because it's the only experience we have so so far next step instead of training [00:54:15] so so far next step instead of training on e 2 we will train on a sample from a [00:54:17] on e 2 we will train on a sample from a 1 in we - it means we will take one out [00:54:20] 1 in we - it means we will take one out of the replay memory and use this one [00:54:21] of the replay memory and use this one for training but we will still continue [00:54:25] for training but we will still continue to experiment something else and we will [00:54:28] to experiment something else and we will sample from there and at every step the [00:54:31] sample from there and at every step the replay memory will become bigger and [00:54:32] replay memory will become bigger and bigger and while we train we will not [00:54:35] bigger and while we train we will not necessarily train on the step we explore [00:54:36] necessarily train on the step we explore we will train on a sample which is the [00:54:39] we will train on a sample which is the replay memory + the new state way we [00:54:41] replay memory + the new state way we explore why is it good is because e 1 as [00:54:46] explore why is it good is because e 1 as you see can be useful many times in the [00:54:48] you see can be useful many times in the training and maybe one was a critical [00:54:50] training and maybe one was a critical state like it was a very important data [00:54:52] state like it was a very important data point to learn or q function and so on [00:54:56] point to learn or q function and so on and so on does the replay memory make [00:54:58] and so on does the replay memory make sense so several advantages one is data [00:55:02] sense so several advantages one is data efficiency we can use data many times [00:55:05] efficiency we can use data many times don't have to use one day to appoint [00:55:06] don't have to use one day to appoint only one time another very important [00:55:10] only one time another very important advantage of experience replay is that [00:55:13] advantage of experience replay is that if you don't use experience replay you [00:55:16] if you don't use experience replay you have a lot of correlation between the [00:55:18] have a lot of correlation between the successive data points so let's say the [00:55:20] successive data points so let's say the ball is on the bottom right here and the [00:55:23] ball is on the bottom right here and the ball is going to the top left for the [00:55:26] ball is going to the top left for the next 10 data points the ball is always [00:55:30] next 10 data points the ball is always going to go to the top left and it means [00:55:33] going to go to the top left and it means the action you can take is always the [00:55:37] the action you can take is always the same it actually doesn't matter a lot [00:55:39] same it actually doesn't matter a lot because the ball is going up but most [00:55:41] because the ball is going up but most likely you want to followed where the [00:55:43] likely you want to followed where the ball is going so the action will be to [00:55:45] ball is going so the action will be to go towards the ball for 10 actions in a [00:55:48] go towards the ball for 10 actions in a row and then the ball will bounce on the [00:55:51] row and then the ball will bounce on the wall and on the top and go back down [00:55:53] wall and on the top and go back down here down to the bottom left the bottom [00:55:56] here down to the bottom left the bottom right what will happen if your paddle is [00:55:59] right what will happen if your paddle is here is that for 10 steps in a row you [00:56:01] here is that for 10 steps in a row you will send your paddle on the right [00:56:04] will send your paddle on the right remember what we said when which when we [00:56:06] remember what we said when which when we asked the [00:56:07] asked the question if you had to train a cat vs. [00:56:09] question if you had to train a cat vs. dog classifier with batches of images of [00:56:11] dog classifier with batches of images of cats batches of images of dog trained [00:56:14] cats batches of images of dog trained first on the cats then trains on the [00:56:15] first on the cats then trains on the dogs then trains on the cats then trains [00:56:17] dogs then trains on the cats then trains on the dogs we will not converge because [00:56:19] on the dogs we will not converge because your network will be super biased [00:56:21] your network will be super biased towards predicting chat after seeing ten [00:56:23] towards predicting chat after seeing ten images of cat super bias bit with [00:56:26] images of cat super bias bit with predicting dogs when it sees ten images [00:56:28] predicting dogs when it sees ten images of dog that's what's happening here [00:56:30] of dog that's what's happening here so you want to deke or elate all these [00:56:33] so you want to deke or elate all these experiences you want to be able to take [00:56:35] experiences you want to be able to take one experience take another one that has [00:56:36] one experience take another one that has nothing to do with it and so on this is [00:56:39] nothing to do with it and so on this is what experience pure play goes and the [00:56:41] what experience pure play goes and the third one is that the third one is that [00:56:44] third one is that the third one is that you're basically trading computation and [00:56:47] you're basically trading computation and memory against exploration exploration [00:56:50] memory against exploration exploration is super costly the state space might be [00:56:52] is super costly the state space might be super big but you know you have enough [00:56:55] super big but you know you have enough computation probably you can have a lot [00:56:56] computation probably you can have a lot of competition and you have memory space [00:56:58] of competition and you have memory space let's use an experience replay okay [00:57:02] let's use an experience replay okay so let's address experience replay to [00:57:05] so let's address experience replay to our code here the transition resulting [00:57:09] our code here the transition resulting from this part is added to the [00:57:11] from this part is added to the experience to the replay memory D and [00:57:13] experience to the replay memory D and will not necessarily be used in the [00:57:15] will not necessarily be used in the iteration space so what's happening is [00:57:16] iteration space so what's happening is before propagate Phi of s we get we [00:57:20] before propagate Phi of s we get we observe a reward and an action and this [00:57:24] observe a reward and an action and this action leads to a state Phi of s prime [00:57:26] action leads to a state Phi of s prime this is an experience instead of [00:57:29] this is an experience instead of training on this experience I'm just [00:57:31] training on this experience I'm just going to take it put it in the replay [00:57:33] going to take it put it in the replay memory add experience to replay memory [00:57:35] memory add experience to replay memory and what I will train on is not this [00:57:38] and what I will train on is not this experience is a sample random mini batch [00:57:40] experience is a sample random mini batch of transition from the replay memory so [00:57:43] of transition from the replay memory so you see we're exploring but we're not [00:57:45] you see we're exploring but we're not training on what we explore we're [00:57:47] training on what we explore we're training on the replay memory but the [00:57:48] training on the replay memory but the replay memory is dynamic it changes and [00:57:53] replay memory is dynamic it changes and update using the sample transitions so [00:57:56] update using the sample transitions so the sample transition from the replay [00:57:58] the sample transition from the replay memory will be used to do the update [00:57:59] memory will be used to do the update that's the hack now another hack we want [00:58:03] that's the hack now another hack we want the last hack we want to talk about is [00:58:05] the last hack we want to talk about is exploration versus exploitation so as a [00:58:08] exploration versus exploitation so as a human let's say you're commuting to [00:58:10] human let's say you're commuting to Stanford every day and you know the road [00:58:12] Stanford every day and you know the road you're commuting yet you know it you [00:58:15] you're commuting yet you know it you always take the same road and your bias [00:58:17] always take the same road and your bias towards taking this road why because [00:58:20] towards taking this road why because the first time you took it it went well [00:58:21] the first time you took it it went well and the more you take it the more you [00:58:24] and the more you take it the more you learn about it not that it's good to [00:58:26] learn about it not that it's good to know the tricks of how to drive fast but [00:58:28] know the tricks of how to drive fast but but like you know the tricks you know [00:58:30] but like you know the tricks you know that this this these slides is going to [00:58:33] that this this these slides is going to be green at that moment and so on so you [00:58:35] be green at that moment and so on so you you build a very good expertise in this [00:58:38] you build a very good expertise in this road super expert but maybe there's [00:58:42] road super expert but maybe there's another road that you don't want to try [00:58:43] another road that you don't want to try that is better [00:58:44] that is better you just don't try it because you're [00:58:47] you just don't try it because you're focused on that road you're doing [00:58:48] focused on that road you're doing exploitation you exploit what you [00:58:50] exploitation you exploit what you already know [00:58:51] already know exploration would be ok let's do it I'm [00:58:54] exploration would be ok let's do it I'm gonna try another road today I might get [00:58:56] gonna try another road today I might get late to the course but maybe I will have [00:58:58] late to the course but maybe I will have a good discovery and I will like this [00:58:59] a good discovery and I will like this road and I will take it later on there's [00:59:01] road and I will take it later on there's a trade-off between these two because [00:59:03] a trade-off between these two because the RL algorithm is going to figure out [00:59:05] the RL algorithm is going to figure out some strategies that are super good and [00:59:08] some strategies that are super good and we'll try to do local search in these to [00:59:11] we'll try to do local search in these to get better and better but you might have [00:59:13] get better and better but you might have another minimum that is better than this [00:59:16] another minimum that is better than this one and you don't explore it using the [00:59:19] one and you don't explore it using the algorithm which currently have there is [00:59:20] algorithm which currently have there is no trade-off between exploitation [00:59:22] no trade-off between exploitation exploration we are almost doing only [00:59:24] exploration we are almost doing only exploitation so how to incentivize this [00:59:27] exploitation so how to incentivize this exploration you guys have an idea [00:59:45] so right now when we're in a state as [00:59:48] so right now when we're in a state as we're for propagating the state process [00:59:51] we're for propagating the state process states in the network and we take the [00:59:52] states in the network and we take the action that is the best action always so [00:59:55] action that is the best action always so we exploiting we're exploiting what we [00:59:57] we exploiting we're exploiting what we already know we take the best action [00:59:59] already know we take the best action instead of taking this best action what [01:00:02] instead of taking this best action what can we do yep Monte Carlo sampling we [01:00:10] can we do yep Monte Carlo sampling we point another one you wanted to try [01:00:11] point another one you wanted to try something else get out of her and there [01:00:13] something else get out of her and there that's the ratio times e take the best [01:00:16] that's the ratio times e take the best action versus exploring another action [01:00:18] action versus exploring another action okay take a hyper parameter that tells [01:00:21] okay take a hyper parameter that tells you when you can explore when you can [01:00:23] you when you can explore when you can exploit that what you mean yeah that's a [01:00:27] exploit that what you mean yeah that's a good point so I think that's that's a [01:00:29] good point so I think that's that's a solution you can take a hyper parameter [01:00:31] solution you can take a hyper parameter that is a probability telling you with [01:00:34] that is a probability telling you with this probability Explorer otherwise with [01:00:37] this probability Explorer otherwise with one - is this probability exploit that's [01:00:40] one - is this probability exploit that's what that's what we're going to do [01:00:42] what that's what we're going to do so let's look why exploration versus [01:00:44] so let's look why exploration versus exploitation doesn't work we're in [01:00:46] exploitation doesn't work we're in initial state 1 s 1 and we have three [01:00:49] initial state 1 s 1 and we have three options either we go using action a1 2 s [01:00:52] options either we go using action a1 2 s 2 and we get reward of 0 or we go to [01:00:54] 2 and we get reward of 0 or we go to action use action to get to s 3 and get [01:00:58] action use action to get to s 3 and get reward of 1 or use action 3 and go to s [01:01:02] reward of 1 or use action 3 and go to s 4 and get a reward of 1,000 so this is [01:01:06] 4 and get a reward of 1,000 so this is obviously where we want to go we want to [01:01:08] obviously where we want to go we want to go to s 4 because it has the maximum [01:01:10] go to s 4 because it has the maximum reward and we don't need to do much [01:01:12] reward and we don't need to do much computation in our head it's simple [01:01:14] computation in our head it's simple there is no discount it's direct just [01:01:17] there is no discount it's direct just after initializing the Q networks you [01:01:19] after initializing the Q networks you get the following Q values for [01:01:21] get the following Q values for propagates s1 induction network and get [01:01:25] propagates s1 induction network and get 0.5 for taking action 1.4 for taking [01:01:28] 0.5 for taking action 1.4 for taking action 2.3 for texting action 3 so this [01:01:33] action 2.3 for texting action 3 so this is obviously not good but our networking [01:01:35] is obviously not good but our networking was randomly initialized what it's [01:01:37] was randomly initialized what it's telling us is that 0.5 is the maximum so [01:01:42] telling us is that 0.5 is the maximum so we should take action 1 so let's go take [01:01:44] we should take action 1 so let's go take action 1 observe s2 you observe a reward [01:01:47] action 1 observe s2 you observe a reward of 0 our targets because it's a terminal [01:01:49] of 0 our targets because it's a terminal state is only equal to the reward there [01:01:52] state is only equal to the reward there is no additional term so we want our [01:01:55] is no additional term so we want our target to match our queue our target is [01:01:57] target to match our queue our target is 0 so Q should match zero [01:01:59] 0 so Q should match zero so we train and we get the cue that [01:02:01] so we train and we get the cue that should be zero that make sense now we do [01:02:07] should be zero that make sense now we do another round of iteration we look we're [01:02:10] another round of iteration we look we're in s1 we get back to the beginning of [01:02:12] in s1 we get back to the beginning of the episode we see that our cue function [01:02:15] the episode we see that our cue function tells us that action 2 is the best [01:02:16] tells us that action 2 is the best because point 4 is the maximum value it [01:02:20] because point 4 is the maximum value it means go to s3 I go to s3 I observe [01:02:24] means go to s3 I go to s3 I observe reward of 1 what does it mean it's a [01:02:27] reward of 1 what does it mean it's a terminal state so my target is 1 y [01:02:29] terminal state so my target is 1 y equals 1 I want the cue to match my [01:02:32] equals 1 I want the cue to match my white [01:02:32] white so my Q should be 1 now I continue third [01:02:37] so my Q should be 1 now I continue third step up q function says go to a 2 I go [01:02:41] step up q function says go to a 2 I go to a 2 nothing happens I already matched [01:02:43] to a 2 nothing happens I already matched the reward for step go to a 2 you see [01:02:48] the reward for step go to a 2 you see what happens we will never go there we [01:02:50] what happens we will never go there we will never get there because we're not [01:02:52] will never get there because we're not exploring so instead of doing that what [01:02:55] exploring so instead of doing that what we're saying is that 5% of the time [01:02:58] we're saying is that 5% of the time take a random action to explore and 95% [01:03:01] take a random action to explore and 95% of the time follow your exploitation ok [01:03:06] of the time follow your exploitation ok so that's where we add it we probably [01:03:08] so that's where we add it we probably see Epsilon the hyper parameter take [01:03:10] see Epsilon the hyper parameter take random action a otherwise do what we [01:03:14] random action a otherwise do what we were doing before exploit does that make [01:03:17] were doing before exploit does that make sense ok cool so now we plugged in all [01:03:22] sense ok cool so now we plugged in all these tricks in our pseudocode and this [01:03:24] these tricks in our pseudocode and this is our new studio code so we have to [01:03:26] is our new studio code so we have to initialize the replay memory which we [01:03:28] initialize the replay memory which we didn't have to do earlier in blue you [01:03:30] didn't have to do earlier in blue you can find the replay memory added lines [01:03:32] can find the replay memory added lines in orange you can find the added lines [01:03:35] in orange you can find the added lines for checking the terminal state and in [01:03:37] for checking the terminal state and in purple you can find the added line is [01:03:40] purple you can find the added line is related to Epsilon greedy exploration [01:03:44] related to Epsilon greedy exploration versus exploitation and finally in bold [01:03:47] versus exploitation and finally in bold the pre-processing [01:03:50] the pre-processing any questions on that so that that's [01:03:54] any questions on that so that that's that's we wanted to see a variant of how [01:03:57] that's we wanted to see a variant of how deep learning can be used in the setting [01:04:00] deep learning can be used in the setting that is not necessarily classic [01:04:01] that is not necessarily classic supervised learning setting [01:04:06] and you see that the main advantage of [01:04:08] and you see that the main advantage of deep learning in this case is it's a [01:04:10] deep learning in this case is it's a good function approximator the [01:04:12] good function approximator the convolutional neural network can extract [01:04:13] convolutional neural network can extract a lot of information from the pixels [01:04:15] a lot of information from the pixels that we were not able to get with other [01:04:18] that we were not able to get with other networks okay so let's let's see what we [01:04:22] networks okay so let's let's see what we have here we have our super battery but [01:04:26] have here we have our super battery but that's gonna dig a tunnel and it's going [01:04:28] that's gonna dig a tunnel and it's going to destroy all the bricks super quickly [01:04:32] it's good to see that after building it [01:04:34] it's good to see that after building it like so this is work from deep mines [01:04:37] like so this is work from deep mines team and you can find this video on [01:04:39] team and you can find this video on YouTube okay another thing I wanted to [01:04:43] YouTube okay another thing I wanted to say quickly is what's the difference [01:04:44] say quickly is what's the difference between weed and without human knowledge [01:04:46] between weed and without human knowledge you will see a lot of people a lot of [01:04:48] you will see a lot of people a lot of papers mentioning that this algorithm [01:04:51] papers mentioning that this algorithm was trained with human learned knowledge [01:04:53] was trained with human learned knowledge or this algorithm was trained without [01:04:55] or this algorithm was trained without any human in the loop [01:04:56] any human in the loop why is human knowledge very important [01:05:00] why is human knowledge very important like think about it just playing one [01:05:03] like think about it just playing one game as a human and teaching that the [01:05:05] game as a human and teaching that the algorithm will help the algorithm a lot [01:05:07] algorithm will help the algorithm a lot when the algorithm sees this game what [01:05:11] when the algorithm sees this game what it sees its pixels what we see when we [01:05:15] it sees its pixels what we see when we see that game we see that there is a key [01:05:17] see that game we see that there is a key here we know the key is usually a good [01:05:19] here we know the key is usually a good thing so we have a lot of context right [01:05:21] thing so we have a lot of context right as a human we know I'm probably gonna go [01:05:24] as a human we know I'm probably gonna go for the key I'm not gonna go for this [01:05:25] for the key I'm not gonna go for this this thing no same ladder [01:05:28] this thing no same ladder what is the ladder we directly identify [01:05:30] what is the ladder we directly identify that the ladder is something we can go [01:05:32] that the ladder is something we can go up and down we identified that this rope [01:05:35] up and down we identified that this rope is probably something I can use to jump [01:05:36] is probably something I can use to jump from one side to the other so as a human [01:05:38] from one side to the other so as a human there is a lot more background [01:05:40] there is a lot more background information that we have even without [01:05:41] information that we have even without knowing it without realizing it so [01:05:44] knowing it without realizing it so there's a huge difference between [01:05:45] there's a huge difference between algorithms trained with [01:05:47] algorithms trained with human-in-the-loop and without human in [01:05:49] human-in-the-loop and without human in the loop this game is actually Montezuma [01:05:51] the loop this game is actually Montezuma revenge the dqn algorithm when the paper [01:05:54] revenge the dqn algorithm when the paper came out on underneath on nature in [01:05:56] came out on underneath on nature in nature the second the second version of [01:05:58] nature the second the second version of the paper they showed that they beat [01:06:00] the paper they showed that they beat human on 49 games that are the same type [01:06:03] human on 49 games that are the same type of games I as break out but this one was [01:06:05] of games I as break out but this one was the hardest one so they couldn't beat [01:06:08] the hardest one so they couldn't beat human on this one and the reason was [01:06:11] human on this one and the reason was because there's a lot of information and [01:06:13] because there's a lot of information and also the game has is very long [01:06:17] also the game has is very long so in order it's called Montezuma [01:06:18] so in order it's called Montezuma revenge and I think ranting pyramids is [01:06:21] revenge and I think ranting pyramids is going to talk about it a little later [01:06:22] going to talk about it a little later but in order to get to win this game you [01:06:25] but in order to get to win this game you have to go through a lot of different [01:06:27] have to go through a lot of different stages and it's super long so it's super [01:06:30] stages and it's super long so it's super hard for the algorithm to explore all [01:06:33] hard for the algorithm to explore all the state space okay so that said I will [01:06:38] the state space okay so that said I will show you a few more games that that the [01:06:41] show you a few more games that that the deepmind team has solved pong is one [01:06:43] deepmind team has solved pong is one sequence is another one and space [01:06:46] sequence is another one and space invaders that you might know which which [01:06:48] invaders that you might know which which is probably the most famous of the three [01:06:50] is probably the most famous of the three Juno okay so that said I'm gonna hand in [01:06:56] Juno okay so that said I'm gonna hand in the microphone to we're lucky to have an [01:06:59] the microphone to we're lucky to have an oral expert so Rammstein terawatts is a [01:07:02] oral expert so Rammstein terawatts is a fourth-year PhD students in RL working [01:07:06] fourth-year PhD students in RL working with professor Bernstein at Stanford and [01:07:08] with professor Bernstein at Stanford and he will tell us a little bit about his [01:07:11] he will tell us a little bit about his experience and he will show us some [01:07:12] experience and he will show us some advanced applications of deep learning [01:07:14] advanced applications of deep learning and RL and how these plug in together [01:07:18] and RL and how these plug in together thank you thanks Cal for that [01:07:20] thank you thanks Cal for that introduction [01:07:21] introduction oh yeah can everyone hear me now all [01:07:24] oh yeah can everyone hear me now all right good cool okay first I have like [01:07:29] right good cool okay first I have like eight nine minutes I have more okay okay [01:07:34] eight nine minutes I have more okay okay first question after seeing that lecture [01:07:38] first question after seeing that lecture so far look how many are you're thinking [01:07:41] so far look how many are you're thinking that RL is actually cool look honestly [01:07:43] that RL is actually cool look honestly that's like oh that's a lot [01:07:46] that's like oh that's a lot oh yeah that's a lot okay my hope is [01:07:50] oh yeah that's a lot okay my hope is after showing you some other advanced [01:07:52] after showing you some other advanced topics ears then the percentage got even [01:07:54] topics ears then the percentage got even increase so let's let's see it's almost [01:07:59] increase so let's let's see it's almost impossible to talk about like [01:08:01] impossible to talk about like advancement RL like recently without [01:08:03] advancement RL like recently without mentioning alphago [01:08:04] mentioning alphago I think somewhere right now who wrote [01:08:06] I think somewhere right now who wrote that on a table that it's almost 10 to [01:08:10] that on a table that it's almost 10 to the power 170 different configuration of [01:08:13] the power 170 different configuration of the board and that's roughly more than I [01:08:17] the board and that's roughly more than I mean that's more than the estimated [01:08:19] mean that's more than the estimated number of atoms in the universe so one [01:08:21] number of atoms in the universe so one traditional algorithm before the deep [01:08:24] traditional algorithm before the deep learning and stuff like that was like [01:08:25] learning and stuff like that was like three searching RL which is basically go [01:08:29] three searching RL which is basically go exhaustively search all the [01:08:30] exhaustively search all the a possible action that you can take and [01:08:32] a possible action that you can take and they'll take the best one in that [01:08:34] they'll take the best one in that situation also good that's all almost [01:08:36] situation also good that's all almost impossible so what they do that's also a [01:08:39] impossible so what they do that's also a paper from deep mind is that they train [01:08:43] paper from deep mind is that they train anyone Ezra for that they kind of [01:08:45] anyone Ezra for that they kind of marriage the tree search we do a bit [01:08:48] marriage the tree search we do a bit different and neural network that they [01:08:50] different and neural network that they have they have two kinds of networks one [01:08:53] have they have two kinds of networks one is called value network and value [01:08:55] is called value network and value network is basically consuming this [01:08:57] network is basically consuming this image image of a board and telling you [01:09:01] image image of a board and telling you what's the probability that if you can [01:09:03] what's the probability that if you can win in this situation so if the value is [01:09:06] win in this situation so if the value is higher than the probability of winning [01:09:08] higher than the probability of winning is higher how does it help you he help [01:09:11] is higher how does it help you he help you in the case that if you want to [01:09:12] you in the case that if you want to search for the action you don't have to [01:09:14] search for the action you don't have to go until the end of the game because the [01:09:15] go until the end of the game because the end of the game is a lot of steps and [01:09:17] end of the game is a lot of steps and it's almost impossible to go to the end [01:09:19] it's almost impossible to go to the end of the game in all the simulations so [01:09:21] of the game in all the simulations so that just helps you to understand what's [01:09:23] that just helps you to understand what's the value of each game like beforehand [01:09:25] the value of each game like beforehand like after look for these simple 50s [01:09:26] like after look for these simple 50s that if you're gonna win that game or if [01:09:28] that if you're gonna win that game or if you're gonna lose that game there's [01:09:29] you're gonna lose that game there's another a network of the policy Network [01:09:32] another a network of the policy Network which helps you to take action but I [01:09:34] which helps you to take action but I think the most interesting thing of the [01:09:37] think the most interesting thing of the Alpha goal is that it's trained from [01:09:39] Alpha goal is that it's trained from scratch so it's trance from nothing and [01:09:42] scratch so it's trance from nothing and if they have a tree called self play [01:09:45] if they have a tree called self play that there is two AI playing with each [01:09:49] that there is two AI playing with each other the best one I replicate the best [01:09:51] other the best one I replicate the best the best one I can keep it fixed and I [01:09:54] the best one I can keep it fixed and I have another one that is trying to cop [01:09:56] have another one that is trying to cop beat the previous version of itself and [01:09:58] beat the previous version of itself and after it complete the previous version [01:10:00] after it complete the previous version of itself like you reliably many times [01:10:02] of itself like you reliably many times then I replace this again for the [01:10:04] then I replace this again for the previous part and then I just said so [01:10:06] previous part and then I just said so this is a training curve of like a self [01:10:08] this is a training curve of like a self a self play of the alphago as you see [01:10:10] a self play of the alphago as you see and it takes a lot of compute so that's [01:10:13] and it takes a lot of compute so that's kind of crazy but finally they beat the [01:10:16] kind of crazy but finally they beat the human okay another type of algorithm but [01:10:21] human okay another type of algorithm but this is like the whole different class [01:10:23] this is like the whole different class of algorithm called a policy gradients [01:10:26] of algorithm called a policy gradients I've developed an algorithm called trust [01:10:29] I've developed an algorithm called trust region policy yeah can I stop the this [01:10:32] region policy yeah can I stop the this method residual during locomotion [01:10:34] method residual during locomotion controllers procedure can you use this [01:10:36] controllers procedure can you use this out please [01:10:37] out please okay great so policy gated algorithm [01:10:45] but I can do is that it stop this from [01:10:47] but I can do is that it stop this from here that is not episode work okay so [01:10:53] here that is not episode work okay so here like in the diction that you have [01:10:55] here like in the diction that you have seen you you came and like compute a Q [01:11:00] seen you you came and like compute a Q value of HS state and then what you have [01:11:02] value of HS state and then what you have done is that you take the arc max of [01:11:04] done is that you take the arc max of this with respect to action and then you [01:11:06] this with respect to action and then you choose the action that you want to [01:11:07] choose the action that you want to choose right but what care at the end of [01:11:10] choose right but what care at the end of the day is the action which is the [01:11:11] the day is the action which is the mapping from a state to action which if [01:11:14] mapping from a state to action which if we call it a policy right so what you [01:11:17] we call it a policy right so what you can at the end of the day is actually [01:11:18] can at the end of the day is actually the policy like what action should I [01:11:19] the policy like what action should I take is not really Q value itself right [01:11:21] take is not really Q value itself right so this clatters a class of methods that [01:11:24] so this clatters a class of methods that call the policy gradients is trying to [01:11:26] call the policy gradients is trying to directly optimize for the policy so [01:11:29] directly optimize for the policy so rather than updating the Q function I [01:11:31] rather than updating the Q function I compute the gradient of my policy I [01:11:34] compute the gradient of my policy I update my policy network again and again [01:11:36] update my policy network again and again and again so let's see these videos so [01:11:40] and again so let's see these videos so this is like this guy that is trying to [01:11:44] this is like this guy that is trying to reach the pink ball over there and [01:11:47] reach the pink ball over there and sometimes that gets hit by the some [01:11:49] sometimes that gets hit by the some external forces and if it's called the [01:11:53] external forces and if it's called the algorithm product EPO obviously policy [01:11:56] algorithm product EPO obviously policy gradient and try to reach to that ball [01:11:58] gradient and try to reach to that ball so I think that you've heard of open a I [01:12:02] so I think that you've heard of open a I like five like the but that is putting [01:12:05] like five like the but that is putting dota so this is like completely like PPO [01:12:09] dota so this is like completely like PPO algorithm and they have like a lot of [01:12:12] algorithm and they have like a lot of compute to showing that and I guess I [01:12:15] compute to showing that and I guess I have the numbers here there is like 180 [01:12:19] have the numbers here there is like 180 years of play in one day this is how [01:12:22] years of play in one day this is how much code could be so that's why there [01:12:27] much code could be so that's why there is another even funnier video right [01:12:33] is another even funnier video right again the same idea it's conjugate yet [01:12:35] again the same idea it's conjugate yet is that you put two asian in front of [01:12:38] is that you put two asian in front of each other and they try to beat each [01:12:40] each other and they try to beat each other and if they beat each other they [01:12:42] other and if they beat each other they give everyone their the most interesting [01:12:46] give everyone their the most interesting part is that for example in that game [01:12:48] part is that for example in that game the purpose is just to pull the other [01:12:51] the purpose is just to pull the other one out right but they understand some [01:12:54] one out right but they understand some emerging behave [01:12:55] emerging behave yeah which is it for us human makes [01:12:59] yeah which is it for us human makes sense but for them to learn out of [01:13:01] sense but for them to learn out of nothing is kind of cool so there's like [01:13:15] nothing is kind of cool so there's like one risk here that when they're playing [01:13:17] one risk here that when they're playing okay this guy's trying to kick the ball [01:13:19] okay this guy's trying to kick the ball inside but one risk here is to overfit [01:13:27] that's also cool again by technical [01:13:36] that's also cool again by technical point before move but I got 28 is that [01:13:38] point before move but I got 28 is that here but where is that not the next one [01:13:41] here but where is that not the next one okay here that - OH - agent playing with [01:13:45] okay here that - OH - agent playing with each other and we are just updating the [01:13:47] each other and we are just updating the person with the best other agent [01:13:49] person with the best other agent previously we are doing a self play is [01:13:52] previously we are doing a self play is that you over fit to the actual agent [01:13:54] that you over fit to the actual agent that you're in front of you so the agent [01:13:56] that you're in front of you so the agent in front of you is powerful but you [01:13:58] in front of you is powerful but you might over fit to this and if I put the [01:14:01] might over fit to this and if I put the agent that is not that powerful but is [01:14:02] agent that is not that powerful but is using a simple trick that the powerful [01:14:04] using a simple trick that the powerful agent that can never use this then you [01:14:06] agent that can never use this then you might just lose the game right so one [01:14:09] might just lose the game right so one trick here to make it more stable is [01:14:11] trick here to make it more stable is that rather than playing against only [01:14:13] that rather than playing against only one agent you alternate between [01:14:16] one agent you alternate between different version of the agent itself so [01:14:19] different version of the agent itself so it all like learns all the skill [01:14:20] it all like learns all the skill together it doesn't over fit to the [01:14:22] together it doesn't over fit to the stuff so there's another thing conduct [01:14:28] stuff so there's another thing conduct meta learning meta learning is a whole [01:14:31] meta learning meta learning is a whole different algorithms again and the [01:14:33] different algorithms again and the purpose is that a lot of tasks are like [01:14:35] purpose is that a lot of tasks are like similar to each other right the core [01:14:37] similar to each other right the core example of watching two left and working [01:14:38] example of watching two left and working - right and like working in the front [01:14:40] - right and like working in the front direction they're like same test [01:14:42] direction they're like same test essentially so the point is rather than [01:14:44] essentially so the point is rather than training on a single test which is like [01:14:46] training on a single test which is like go left or go right you train a [01:14:49] go left or go right you train a distribution of tests that are similar [01:14:51] distribution of tests that are similar to each other and then the idea is that [01:14:53] to each other and then the idea is that for each a specific task I should learn [01:14:57] for each a specific task I should learn bit like a very few gradient step so [01:14:59] bit like a very few gradient step so very few updates should be enough for me [01:15:01] very few updates should be enough for me so if I learn okay play this videos like [01:15:05] so if I learn okay play this videos like at the beginning this agent that has [01:15:07] at the beginning this agent that has been trained with metal [01:15:08] been trained with metal before it doesn't know how to move but [01:15:11] before it doesn't know how to move but just look at the number of gradient [01:15:12] just look at the number of gradient steps like after two or three guillotine [01:15:14] steps like after two or three guillotine steps [01:15:15] steps it totally knows how to move that's that [01:15:17] it totally knows how to move that's that normally takes a lot of steps to train [01:15:19] normally takes a lot of steps to train but that's only because of the meta [01:15:21] but that's only because of the meta learning approach that we've used here [01:15:22] learning approach that we've used here meta learning is also cool I mean the [01:15:25] meta learning is also cool I mean the algorithm is from Brickley Chelsea Finn [01:15:27] algorithm is from Brickley Chelsea Finn which is not also coming to Stanford is [01:15:29] which is not also coming to Stanford is called model agnostic meta learning so [01:15:34] called model agnostic meta learning so all right another point this very [01:15:37] all right another point this very interesting game Montezuma revenge that [01:15:39] interesting game Montezuma revenge that young talk how much time they have yeah [01:15:44] young talk how much time they have yeah so you've seen a exploration [01:15:48] so you've seen a exploration exploitation dilemma right so it's it's [01:15:51] exploitation dilemma right so it's it's it's bad if you don't you're gonna fail [01:15:54] it's bad if you don't you're gonna fail many times so if you do the exploration [01:15:57] many times so if you do the exploration scheme that you just saw that this is a [01:16:02] scheme that you just saw that this is a map of the particular game and you can [01:16:04] map of the particular game and you can see call it things of that game if you [01:16:07] see call it things of that game if you like exploration land of it and I think [01:16:10] like exploration land of it and I think has a twenty one or twenty something [01:16:12] has a twenty one or twenty something different that it's hard to reach so [01:16:16] different that it's hard to reach so this recent paper anything from Google [01:16:18] this recent paper anything from Google brain for mark dilemma and team is [01:16:20] brain for mark dilemma and team is called [01:16:21] called unifying the count based methods for [01:16:23] unifying the count based methods for exploration exploration essentially very [01:16:25] exploration exploration essentially very hard challenge mostly in the situation [01:16:28] hard challenge mostly in the situation that the reward is a sparse for exactly [01:16:30] that the reward is a sparse for exactly in this game but the first reward that [01:16:32] in this game but the first reward that you get is when you reach the key right [01:16:34] you get is when you reach the key right from top to here it's almost like two [01:16:38] from top to here it's almost like two hundred steps and getting the number of [01:16:41] hundred steps and getting the number of actuals after two hundred steps exactly [01:16:43] actuals after two hundred steps exactly right but like a random exploration it's [01:16:45] right but like a random exploration it's almost impossible so you're never gonna [01:16:47] almost impossible so you're never gonna do that what a very interesting trick [01:16:50] do that what a very interesting trick here is that you kind of skip account on [01:16:54] here is that you kind of skip account on how many times you visited a state and [01:16:56] how many times you visited a state and then if you visit a state that is that [01:17:02] then if you visit a state that is that has like a few accounts then you give it [01:17:04] has like a few accounts then you give it a revert to the agent so we call it the [01:17:06] a revert to the agent so we call it the intrinsic reward so it's kind of makes [01:17:09] intrinsic reward so it's kind of makes the [01:17:18] so environment is also intensive ability [01:17:28] so environment is also intensive ability it has the instant tips to just go etc [01:17:31] it has the instant tips to just go etc because increase the counts of this [01:17:34] because increase the counts of this statement has never seen before so this [01:17:36] statement has never seen before so this gets the nature tries to the experiment [01:17:38] gets the nature tries to the experiment so it just goes down visit like [01:17:41] so it just goes down visit like different rooms like them so this game [01:17:45] different rooms like them so this game is interesting if you certain people [01:17:47] is interesting if you certain people that to solve the game is huge research [01:17:51] that to solve the game is huge research online [01:17:52] online okay the highest of all of the game and [01:17:54] okay the highest of all of the game and it's just fun also to see the agent play [01:18:04] [Music] [01:18:09] any question well right there is also [01:18:16] any question well right there is also another interesting point that would be [01:18:22] another interesting point that would be just fun to know about is called [01:18:24] just fun to know about is called imitation learning imitation learning is [01:18:27] imitation learning imitation learning is the case that well I mean RL agent so [01:18:29] the case that well I mean RL agent so sometimes you don't know the revoir like [01:18:32] sometimes you don't know the revoir like for example in Atari games the revolt is [01:18:34] for example in Atari games the revolt is the key very well-defined right if I get [01:18:36] the key very well-defined right if I get the key I get the reward that just [01:18:38] the key I get the reward that just obvious but sometimes like defining the [01:18:41] obvious but sometimes like defining the reward is hard for example when the car [01:18:43] reward is hard for example when the car like the blue one wanna drive in a in [01:18:46] like the blue one wanna drive in a in some high rail what is the definition of [01:18:48] some high rail what is the definition of the river right so we don't have a clear [01:18:50] the river right so we don't have a clear definition of that but on the other hand [01:18:52] definition of that but on the other hand you have a person like you have human [01:18:53] you have a person like you have human expert that can drive for us and then we [01:18:56] expert that can drive for us and then we see oh this is the right way of driving [01:18:57] see oh this is the right way of driving right so in this situation we have [01:19:00] right so in this situation we have something called imitation learning that [01:19:01] something called imitation learning that we try to mimic the behavior of an [01:19:04] we try to mimic the behavior of an expert so not exactly copying this [01:19:07] expert so not exactly copying this because if we copy this and then you [01:19:09] because if we copy this and then you show us it completely different states [01:19:11] show us it completely different states that we don't know what to do but from [01:19:13] that we don't know what to do but from now we learn and this is like my example [01:19:15] now we learn and this is like my example and there's a paper that called [01:19:18] and there's a paper that called journal that the Soviet imitation [01:19:19] journal that the Soviet imitation learning which was like from a Stefano's [01:19:21] learning which was like from a Stefano's group here at Stanford and that was also [01:19:23] group here at Stanford and that was also interesting well I think that's advanced [01:19:27] interesting well I think that's advanced topic if you have any questions I'm here [01:19:29] topic if you have any questions I'm here here [01:19:30] here put yeah for next week so there's no [01:19:42] put yeah for next week so there's no assignments you guys not finished [01:19:44] assignments you guys not finished let's eat bye and you know about speak [01:19:46] let's eat bye and you know about speak like smuggles now project project these [01:19:51] like smuggles now project project these partners get 2 turkeys as you know and [01:19:55] partners get 2 turkeys as you know and there is going to be the project team [01:19:59] there is going to be the project team mentorships in this Friday we have [01:20:01] mentorships in this Friday we have connection with reading research papers [01:20:03] connection with reading research papers we go over the the object detection you [01:20:07] we go over the the object detection you know and there will be two papers from [01:20:09] know and there will be two papers from red mouse okay ================================================================================ LECTURE 010 ================================================================================ Stanford CS230: Deep Learning | Autumn 2018 | Lecture 10 - Chatbots / Closing Remarks Source: https://www.youtube.com/watch?v=IFLstgCNOA4 --- Transcript [00:00:04] so hello everyone and welcome for the [00:00:08] so hello everyone and welcome for the last lecture of cs2 30 deep learning so [00:00:12] last lecture of cs2 30 deep learning so it's been 10 weeks and [00:00:13] it's been 10 weeks and then you've been studying deep learning [00:00:16] then you've been studying deep learning all around starting with fully connected [00:00:19] all around starting with fully connected networks understanding how to boost [00:00:22] networks understanding how to boost these networks and make them better and [00:00:24] these networks and make them better and then using recurrent neural networks in [00:00:27] then using recurrent neural networks in the last part and convolutional neural [00:00:29] the last part and convolutional neural networks in the fourth part to build [00:00:31] networks in the fourth part to build models for imaging and text and other [00:00:35] models for imaging and text and other applications so today is the class [00:00:38] applications so today is the class wrap-up and the lecture might be [00:00:40] wrap-up and the lecture might be slightly shorter than usual but we're [00:00:45] slightly shorter than usual but we're going to go over a small case study on [00:00:48] going to go over a small case study on conversional assistants to start with [00:00:51] conversional assistants to start with which is a neutral topic we will do a [00:00:55] which is a neutral topic we will do a small quiz competition with Monty and [00:00:57] small quiz competition with Monty and the fastest person who has the best [00:01:01] the fastest person who has the best answer will win 400 hours of GPU credits [00:01:06] answer will win 400 hours of GPU credits on Amazon [00:01:08] on Amazon so you guys can can start can start [00:01:11] so you guys can can start can start working on it [00:01:13] working on it we will see some class project advice [00:01:16] we will see some class project advice because you guys have about two weeks [00:01:19] because you guys have about two weeks less than two weeks before the the [00:01:21] less than two weeks before the the poster presentation and the final [00:01:24] poster presentation and the final project due date will also go over some [00:01:28] project due date will also go over some of the next steps after CS 2:30 what [00:01:31] of the next steps after CS 2:30 what have our students done over the past [00:01:33] have our students done over the past year and what we think are good next [00:01:36] year and what we think are good next steps and closing remarks to finish I'll [00:01:40] steps and closing remarks to finish I'll by you if you have a clicker with with [00:01:43] by you if you have a clicker with with bathroom please can you bring it to me [00:01:46] okay so let's get started with how to [00:01:49] okay so let's get started with how to build a chatbot to help students find or [00:01:52] build a chatbot to help students find or and enroll in the right course so this [00:01:57] and enroll in the right course so this is going to be a pretty simple case of a [00:01:59] is going to be a pretty simple case of a chat bot because chat BOTS and [00:02:01] chat bot because chat BOTS and commercial conversational assistants in [00:02:04] commercial conversational assistants in general have been very hard to build and [00:02:06] general have been very hard to build and have an initial topic there are some [00:02:08] have an initial topic there are some places where academia has helped the [00:02:12] places where academia has helped the chat pods improvements and here we're [00:02:15] chat pods improvements and here we're going to see how we can take all our [00:02:17] going to see how we can take all our algorithms what we've learned in this [00:02:19] algorithms what we've learned in this class and plug it in in a conversional [00:02:22] class and plug it in in a conversional setting that sounds good so let me give [00:02:24] setting that sounds good so let me give you an example students might write to [00:02:28] you an example students might write to the chat bot hi I want to enroll in CS [00:02:30] the chat bot hi I want to enroll in CS 106a for winter 2019 to learn coding the [00:02:35] 106a for winter 2019 to learn coding the chat bot can answer for sure I just [00:02:38] chat bot can answer for sure I just enrolled you so that would be one goal [00:02:40] enrolled you so that would be one goal of the chat bot a second example might [00:02:44] of the chat bot a second example might be finding information about classes hi [00:02:48] be finding information about classes hi what are the undergraduate level history [00:02:51] what are the undergraduate level history classes offered in spring 2019 then the [00:02:55] classes offered in spring 2019 then the chat bot can get back to the students [00:02:56] chat bot can get back to the students and said here's the list of history [00:02:58] and said here's the list of history classes offered in spring 2008 so we're [00:03:03] classes offered in spring 2008 so we're making a small assumption here we're [00:03:05] making a small assumption here we're building a chat bot for a very [00:03:06] building a chat bot for a very restricted area in general and a lot of [00:03:10] restricted area in general and a lot of time chat BOTS which work very well our [00:03:13] time chat BOTS which work very well our super goal-oriented or transactional and [00:03:17] super goal-oriented or transactional and the state of possible [00:03:21] the state of possible or requests from users is small smaller [00:03:24] or requests from users is small smaller than what you could expect in other [00:03:26] than what you could expect in other industrial settings so here we're making [00:03:28] industrial settings so here we're making the assumption that the students will [00:03:30] the assumption that the students will only try to find information about a [00:03:32] only try to find information about a course or will try to enroll in the [00:03:35] course or will try to enroll in the course so I want you guys to pair in [00:03:39] course so I want you guys to pair in groups of two or three and try to come [00:03:42] groups of two or three and try to come up with ideas of what methods that we've [00:03:46] up with ideas of what methods that we've seen together can be used in order to [00:03:49] seen together can be used in order to implement such a chatbot okay so take a [00:03:52] implement such a chatbot okay so take a minute [00:03:53] minute introduce yourself to your mates and and [00:03:55] introduce yourself to your mates and and try to figure out which methods can be [00:03:58] try to figure out which methods can be leveraged in this case okay let's see [00:04:03] leveraged in this case okay let's see what we have here parlance for natural [00:04:06] what we have here parlance for natural language processing transfer learning [00:04:09] language processing transfer learning and let's seem to pick out important [00:04:11] and let's seem to pick out important words from inputs based on those input [00:04:14] words from inputs based on those input triggers output some predefined [00:04:16] triggers output some predefined information from storage yeah so this [00:04:19] information from storage yeah so this seems to to say that there is going to [00:04:23] seems to to say that there is going to be one learning part where we need to [00:04:26] be one learning part where we need to have probably record neural networks [00:04:29] have probably record neural networks helping out and one other knowledge base [00:04:32] helping out and one other knowledge base or storage part where we can retrieve [00:04:34] or storage part where we can retrieve some information we're going to see that [00:04:36] some information we're going to see that some attention models is true that today [00:04:40] some attention models is true that today a lot of natural language processing [00:04:42] a lot of natural language processing models are built with attention models [00:04:49] Arnon for speech recognition and speech [00:04:51] Arnon for speech recognition and speech generation so we didn't talk about the [00:04:53] generation so we didn't talk about the speech parts so far we assume that the [00:04:56] speech parts so far we assume that the congressional assistant is text-based [00:04:58] congressional assistant is text-based but later on we will see what happens if [00:05:01] but later on we will see what happens if we want to add speech to it fancy [00:05:08] we want to add speech to it fancy methods or reinforcement learning for [00:05:13] methods or reinforcement learning for making decisions about responses that's [00:05:15] making decisions about responses that's interesting so why do you guys think we [00:05:19] interesting so why do you guys think we would need reinforcement learning yes [00:05:26] of different states and you also have [00:05:27] of different states and you also have like a volume associated the music is [00:05:29] like a volume associated the music is very goal-oriented and so you could sort [00:05:32] very goal-oriented and so you could sort of have a dress in that fashion yeah [00:05:33] of have a dress in that fashion yeah that's good [00:05:34] that's good so just to repeat it's important to keep [00:05:37] so just to repeat it's important to keep a notion of context and also we have a [00:05:39] a notion of context and also we have a sequence of utterances from the user and [00:05:42] sequence of utterances from the user and the commercial assistance and probably [00:05:46] the commercial assistance and probably the outcome of the conversation would [00:05:48] the outcome of the conversation would come far along the way and not at every [00:05:50] come far along the way and not at every step so that's true reinforcement [00:05:54] step so that's true reinforcement learning has been a research topic for [00:05:57] learning has been a research topic for commercial assistants as well and often [00:05:59] commercial assistants as well and often time we will try to learn a policy for [00:06:02] time we will try to learn a policy for the chat pod which given a state will [00:06:03] the chat pod which given a state will tell us what action to take next this [00:06:06] tell us what action to take next this can be done using Q learning which is [00:06:08] can be done using Q learning which is the method we've seen together or some [00:06:10] the method we've seen together or some time with policy gradients okay word and [00:06:15] time with policy gradients okay word and coding so I word embedding probably okay [00:06:22] coding so I word embedding probably okay cool so I agree there's many ways to to [00:06:26] cool so I agree there's many ways to to plug in a deep learning algorithm in [00:06:28] plug in a deep learning algorithm in this chat pod setting we're going to see [00:06:30] this chat pod setting we're going to see a few of them first I'd like to [00:06:39] a few of them first I'd like to introduce some vocabulary which is [00:06:41] introduce some vocabulary which is commonly used when talking about [00:06:43] commonly used when talking about commercial assistance conversational [00:06:45] commercial assistance conversational assistance an utterance is you can think [00:06:48] assistance an utterance is you can think of it as a user input so if I say the [00:06:50] of it as a user input so if I say the student utterance it's the sentence that [00:06:53] student utterance it's the sentence that was written by the students for the chat [00:06:55] was written by the students for the chat pod the assistant utterance is the one [00:06:59] pod the assistant utterance is the one coming from the chat pod site the intent [00:07:01] coming from the chat pod site the intent denotes the intention of the user so in [00:07:04] denotes the intention of the user so in our case we will have two intents which [00:07:06] our case we will have two intents which is very limited the user either wants to [00:07:08] is very limited the user either wants to find information from for a course or [00:07:11] find information from for a course or the user wants to enroll in a class [00:07:15] the user wants to enroll in a class these are two different intentions that [00:07:17] these are two different intentions that are probably to be detected early on the [00:07:20] are probably to be detected early on the conversation and then you have something [00:07:22] conversation and then you have something called slots slots are used to gather [00:07:26] called slots slots are used to gather multiple information from the user on a [00:07:29] multiple information from the user on a specific intent that the user has so [00:07:32] specific intent that the user has so let's say the students wants to enroll [00:07:34] let's say the students wants to enroll in a class in order to enroll the [00:07:36] in a class in order to enroll the students in a class you need to fill in [00:07:38] students in a class you need to fill in several slots [00:07:39] several slots you need to understand probably which [00:07:41] you need to understand probably which class the student is talking about which [00:07:45] class the student is talking about which quarter the student wants to enroll in [00:07:47] quarter the student wants to enroll in the class which year is the student [00:07:49] the class which year is the student talking about and eventually you want to [00:07:51] talking about and eventually you want to know this su su ID of the students but [00:07:55] know this su su ID of the students but probably we can assume that the su ID is [00:07:57] probably we can assume that the su ID is already encoded in a conversation on the [00:07:59] already encoded in a conversation on the environment we're in so these are three [00:08:03] environment we're in so these are three vocabulary and we're also going to talk [00:08:05] vocabulary and we're also going to talk about turns four conversational [00:08:08] about turns four conversational assistance so so single turn [00:08:11] assistance so so single turn conversation is when there is just a [00:08:14] conversation is when there is just a user returns and a response and multi [00:08:18] user returns and a response and multi turn is when there is several user [00:08:19] turn is when there is several user utterances and conversational assistant [00:08:24] utterances and conversational assistant utterances and you understand that Multi [00:08:28] utterances and you understand that Multi multi utterance conversations are harder [00:08:30] multi utterance conversations are harder to understand because we need to track [00:08:32] to understand because we need to track context our assumption today will be [00:08:36] context our assumption today will be that we work in an environment with [00:08:37] that we work in an environment with limited intents and slots it means we [00:08:40] limited intents and slots it means we can define to intents and for each of [00:08:41] can define to intents and for each of these two intents there are several [00:08:42] these two intents there are several slots that we want to fill in is going [00:08:44] slots that we want to fill in is going to make our life easier of course in [00:08:48] to make our life easier of course in practice you can have multi myriads of [00:08:51] practice you can have multi myriads of intents and slots and you the task [00:08:54] intents and slots and you the task becomes more complicated when you have [00:08:56] becomes more complicated when you have more of those so my first question would [00:09:00] more of those so my first question would be how to detect the intent based on the [00:09:06] be how to detect the intent based on the user utterance can you talk about what [00:09:09] user utterance can you talk about what kind of data set you need to build in [00:09:11] kind of data set you need to build in order to train a model to detect the [00:09:12] order to train a model to detect the intent [00:09:25] or what type of network you need there [00:09:37] or what type of network you need there is not a single good answer so go for it [00:09:40] is not a single good answer so go for it it's your brain zone so I think there's [00:09:43] it's your brain zone so I think there's there's gonna be two options obviously [00:09:46] there's gonna be two options obviously because we have a we have a sequence [00:09:47] because we have a we have a sequence coming in which is the user input we [00:09:50] coming in which is the user input we might want to use the recurrent neural [00:09:51] might want to use the recurrent neural network to encode long term dependencies [00:09:54] network to encode long term dependencies or you might wanna use a convolutional [00:09:56] or you might wanna use a convolutional net work actually convolutional networks [00:10:00] net work actually convolutional networks have some benefits that's recurrent [00:10:02] have some benefits that's recurrent neural networks don't have and they they [00:10:04] neural networks don't have and they they might work better for example if the [00:10:06] might work better for example if the intent we're looking for is always [00:10:08] intent we're looking for is always encoded in a small number of words [00:10:10] encoded in a small number of words somewhere in the input sequence because [00:10:13] somewhere in the input sequence because you will have a filter scanning that and [00:10:15] you will have a filter scanning that and the filter can detect the intent so if [00:10:17] the filter can detect the intent so if you have a filter that was trained in [00:10:19] you have a filter that was trained in order to detect the intent inform [00:10:21] order to detect the intent inform another filter trained to detect the [00:10:23] another filter trained to detect the intent and roll then these two filter [00:10:26] intent and roll then these two filter will detect the word enroll or the word [00:10:29] will detect the word enroll or the word I'm looking for and so on in order to [00:10:31] I'm looking for and so on in order to detect the intent okay in terms of data [00:10:35] detect the intent okay in terms of data what you probably need is pairs of user [00:10:38] what you probably need is pairs of user utterances along with the intent of the [00:10:41] utterances along with the intent of the user so you would need to label the [00:10:44] user so you would need to label the datasets like this one with X and input [00:10:46] datasets like this one with X and input I want to so it's padded I want to [00:10:48] I want to so it's padded I want to enroll in CS 106a for winter 2019 to [00:10:50] enroll in CS 106a for winter 2019 to learn coding and this you will label it [00:10:53] learn coding and this you will label it as enroll and notice that enroll here is [00:10:57] as enroll and notice that enroll here is a function so the label is actually [00:11:00] a function so the label is actually noted as a function and the reason is [00:11:02] noted as a function and the reason is because we can call this function in [00:11:04] because we can call this function in order to issue information [00:11:06] order to issue information another example is hi what are the [00:11:08] another example is hi what are the undergraduate level history classes [00:11:09] undergraduate level history classes offered in spring 2018 and this would be [00:11:11] offered in spring 2018 and this would be label as in form so it's probably a two [00:11:15] label as in form so it's probably a two class classification or three classes if [00:11:17] class classification or three classes if you want to add a third class that [00:11:20] you want to add a third class that corresponds to other intents a user [00:11:23] corresponds to other intents a user might want to use this chat bot for [00:11:25] might want to use this chat bot for another intent that the chat bar wasn't [00:11:27] another intent that the chat bar wasn't built for so these are the [00:11:30] built for so these are the classes enroll in inform and what's [00:11:32] classes enroll in inform and what's interesting is that if we identify that [00:11:34] interesting is that if we identify that the intent of the user is enroll we [00:11:37] the intent of the user is enroll we probably want to call an API or to [00:11:39] probably want to call an API or to request information from another server [00:11:41] request information from another server and in this case it might be access [00:11:43] and in this case it might be access because the the platform we use to [00:11:45] because the the platform we use to enroll in classes is access and same to [00:11:49] enroll in classes is access and same to retrieve information in order to help [00:11:50] retrieve information in order to help the user about their classes we can [00:11:52] the user about their classes we can probably call explore courses assuming [00:11:55] probably call explore courses assuming that these these services have api's [00:12:00] that these these services have api's these surfaces have api's does that make [00:12:03] these surfaces have api's does that make sense and now the interesting part is [00:12:06] sense and now the interesting part is that the unroll function might request [00:12:08] that the unroll function might request some inputs that you have to identify [00:12:12] some inputs that you have to identify those will be the slots same for the [00:12:14] those will be the slots same for the inform function okay so we could train a [00:12:19] inform function okay so we could train a sequence classifier either convolutional [00:12:21] sequence classifier either convolutional or record and this we're not going to go [00:12:24] or record and this we're not going to go into the details you've learnt it in the [00:12:25] into the details you've learnt it in the sequence models class how to detect the [00:12:29] sequence models class how to detect the slots now so in terms of data it's going [00:12:34] slots now so in terms of data it's going to look very similar to the previous one [00:12:36] to look very similar to the previous one but we will have a sequence to sequence [00:12:37] but we will have a sequence to sequence problem now where the user utterance [00:12:40] problem now where the user utterance will be a sequence of words and the [00:12:43] will be a sequence of words and the slots tag will also be a sequence so for [00:12:46] slots tag will also be a sequence so for example [00:12:46] example show me the Tuesday fifth of December [00:12:50] show me the Tuesday fifth of December flights from paris to kuala lumpur if [00:12:52] flights from paris to kuala lumpur if you were to build conversational [00:12:55] you were to build conversational assistance for flights booking then the [00:13:00] assistance for flights booking then the label you want to have is probably [00:13:01] label you want to have is probably something like that doesn't have to be [00:13:03] something like that doesn't have to be exactly this but why they note zero for [00:13:07] exactly this but why they note zero for some of the words the sequence is be day [00:13:10] some of the words the sequence is be day by day or be dep be are are what do you [00:13:17] by day or be dep be are are what do you think these correspond to and why do we [00:13:18] think these correspond to and why do we need that we've probably you've probably [00:13:23] need that we've probably you've probably seen that in in the sections a few weeks [00:13:25] seen that in in the sections a few weeks back [00:13:30] so why do we denote these labels in a [00:13:34] so why do we denote these labels in a certain format and then the other one [00:13:51] yeah yeah correct so I agree with what [00:13:56] yeah yeah correct so I agree with what you said for day they departure arrival [00:13:59] you said for day they departure arrival arrival so these words are encoding they [00:14:01] arrival so these words are encoding they departure and arrival how about the B [00:14:03] departure and arrival how about the B and the I and the O someone has an idea [00:14:09] and the I and the O someone has an idea is a beginning sometimes these things [00:14:12] is a beginning sometimes these things are moving more water yeah exactly [00:14:14] are moving more water yeah exactly please be be the notes beginning while I [00:14:18] please be be the notes beginning while I the notes in or inside and O out or [00:14:21] the notes in or inside and O out or output general so what happens here is [00:14:25] output general so what happens here is that sometimes you would have a slot [00:14:27] that sometimes you would have a slot which might be filled by several words [00:14:29] which might be filled by several words and not a single word and you want to be [00:14:31] and not a single word and you want to be able to detect this entire chunk it's [00:14:34] able to detect this entire chunk it's called chunking so you would use a [00:14:37] called chunking so you would use a special encoding in order to identify if [00:14:40] special encoding in order to identify if this word is the beginning of a word [00:14:42] this word is the beginning of a word that you want to fill in the slot or is [00:14:45] that you want to fill in the slot or is the end or inside or out of the word you [00:14:48] the end or inside or out of the word you want to fill in the slot and then they [00:14:50] want to fill in the slot and then they departure and arrival or three possible [00:14:52] departure and arrival or three possible slots that we want to fill in in order [00:14:54] slots that we want to fill in in order to be able to book the flight if you [00:14:57] to be able to book the flight if you don't receive these slots you might want [00:14:59] don't receive these slots you might want to have your chat BOTS request these [00:15:01] to have your chat BOTS request these slots later okay so another example in [00:15:06] slots later okay so another example in classes here can be daily departure [00:15:08] classes here can be daily departure arrival class like you want to travel in [00:15:11] arrival class like you want to travel in echo or business a number of passenger [00:15:14] echo or business a number of passenger that you want to have on your flight if [00:15:17] that you want to have on your flight if we were for our chat but here it would [00:15:20] we were for our chat but here it would be hi I want to enroll in CS 106 3a for [00:15:22] be hi I want to enroll in CS 106 3a for winter 2019 to learn coding and we will [00:15:24] winter 2019 to learn coding and we will encode it by the beginning of the code [00:15:27] encode it by the beginning of the code of the class beginning of the quarter [00:15:29] of the class beginning of the quarter and beginning of the year that would be [00:15:32] and beginning of the year that would be a possible encoding and then you will [00:15:34] a possible encoding and then you will train using a probably a recurrent [00:15:38] train using a probably a recurrent neural network and algorithm to predict [00:15:40] neural network and algorithm to predict all the [00:15:40] all the tags that make sense so now we have [00:15:45] tags that make sense so now we have already two models that are running on [00:15:48] already two models that are running on our chat bots one that is for the [00:15:50] our chat bots one that is for the intense and one that is for the tags [00:15:56] what do you think about joint training [00:15:59] what do you think about joint training you think it's something we could do [00:16:01] you think it's something we could do here and what do I mean by joint [00:16:05] here and what do I mean by joint training [00:16:15] yep trading on all the different codes [00:16:18] yep trading on all the different codes like training for the defect border year [00:16:20] like training for the defect border year and class are training in separate can [00:16:23] and class are training in separate can work to each of themselves like the [00:16:24] work to each of themselves like the joint element of the Train not training [00:16:28] joint element of the Train not training for different codes no I was talking [00:16:31] for different codes no I was talking more about training for different tasks [00:16:33] more about training for different tasks so infants and intent for enrolling [00:16:36] so infants and intent for enrolling intent from intent and and and slots [00:16:39] intent from intent and and and slots tagging and here we have one intent [00:16:43] tagging and here we have one intent classifier which takes an input sequence [00:16:45] classifier which takes an input sequence and outputs a single class and we have a [00:16:48] and outputs a single class and we have a slot tiger which takes the same input [00:16:53] slot tiger which takes the same input exactly the same input and tags every [00:16:56] exactly the same input and tags every single word in the sequence so probably [00:16:58] single word in the sequence so probably we can use joint training in order to [00:17:00] we can use joint training in order to train one network that might be able to [00:17:02] train one network that might be able to do both and this network will be jointly [00:17:05] do both and this network will be jointly trained with two different last [00:17:06] trained with two different last functions one for the intent and one for [00:17:08] functions one for the intent and one for the slaughter [00:17:09] the slaughter it's usually helpful to jointly train [00:17:13] it's usually helpful to jointly train two networks especially in the earlier [00:17:15] two networks especially in the earlier layers because you end up learning the [00:17:17] layers because you end up learning the same type of features that's that's [00:17:20] same type of features that's that's interesting for natural language [00:17:21] interesting for natural language processing there is it yes boss function [00:17:25] processing there is it yes boss function for them is it calculate both losses and [00:17:28] for them is it calculate both losses and something together or is there a [00:17:30] something together or is there a trade-off between five minutes versus [00:17:33] trade-off between five minutes versus five seasons so the question is how [00:17:35] five seasons so the question is how would you describe the loss function in [00:17:37] would you describe the loss function in this joint training it was actually some [00:17:39] this joint training it was actually some two loss functions the two loss [00:17:41] two loss functions the two loss functions you are using you would just [00:17:42] functions you are using you would just sum them and hope that's the [00:17:44] sum them and hope that's the backpropagation will train actually both [00:17:46] backpropagation will train actually both networks and the networks will probably [00:17:48] networks and the networks will probably have a common base and then we'd be [00:17:51] have a common base and then we'd be separated after so let's say you have a [00:17:53] separated after so let's say you have a first lsdm layer that encode some [00:17:55] first lsdm layer that encode some information about your user utterance [00:17:59] information about your user utterance then this will give we give its output [00:18:03] then this will give we give its output to two different networks which will [00:18:04] to two different networks which will will be trained separately okay and [00:18:08] will be trained separately okay and classes here are codes for the class [00:18:10] classes here are codes for the class quarter your NSU ID [00:18:12] quarter your NSU ID assuming su ID is already in the [00:18:14] assuming su ID is already in the environment we will not need to request [00:18:16] environment we will not need to request it so can you tell me how to acquire [00:18:19] it so can you tell me how to acquire this data now that we've seen it so take [00:18:23] this data now that we've seen it so take take about a minute to discuss with your [00:18:25] take about a minute to discuss with your mates how to acquire that type of data [00:18:29] mates how to acquire that type of data and then answer on Monty okay so let's [00:18:33] and then answer on Monty okay so let's go over some of the answers Mechanical [00:18:38] go over some of the answers Mechanical Turk have people manually collect [00:18:39] Turk have people manually collect annotate the data that's true [00:18:41] annotate the data that's true so as we discussed earlier in the [00:18:43] so as we discussed earlier in the quarter this would be the method which [00:18:45] quarter this would be the method which is probably the more rigorous when it's [00:18:48] is probably the more rigorous when it's applied with a specific labeling process [00:18:51] applied with a specific labeling process and data collection process it will take [00:18:55] and data collection process it will take more time so you would have to build a [00:18:58] more time so you would have to build a UI user interface for them to be able to [00:19:02] UI user interface for them to be able to label all these data which is not [00:19:04] label all these data which is not trivial in general Amazon Mechanical [00:19:07] trivial in general Amazon Mechanical Turk a large number of Stanford students [00:19:09] Turk a large number of Stanford students that works have a human chat assistant [00:19:14] that works have a human chat assistant service user and enter the data in hand [00:19:17] service user and enter the data in hand labeled data the ITU can start with hand [00:19:19] labeled data the ITU can start with hand labeling probably can also generate some [00:19:23] labeling probably can also generate some data by substituting dead courses [00:19:24] data by substituting dead courses quarter and other tags oh that's a good [00:19:26] quarter and other tags oh that's a good idea [00:19:27] idea so who wrote that someone wants to [00:19:29] so who wrote that someone wants to comment yeah that's a good idea so I [00:19:45] comment yeah that's a good idea so I repeat for the SCPD students we already [00:19:48] repeat for the SCPD students we already have a bunch of possible dates we can [00:19:51] have a bunch of possible dates we can easily find a list of dates you've done [00:19:53] easily find a list of dates you've done it in one assignment right where you [00:19:56] it in one assignment right where you were using the neural machine [00:19:57] were using the neural machine translation to transfer for human [00:19:59] translation to transfer for human readable dates to machine readable dates [00:20:01] readable dates to machine readable dates so we have data sets of dates so we [00:20:04] so we have data sets of dates so we could use that we also have a list of [00:20:07] could use that we also have a list of courses that we can probably find on [00:20:09] courses that we can probably find on explore courses we know that they're not [00:20:13] explore courses we know that they're not too many quarters and and we are we have [00:20:17] too many quarters and and we are we have probably databases for any other tagil [00:20:19] probably databases for any other tagil at least of possible su ideas or like [00:20:21] at least of possible su ideas or like seven figures something like that so all [00:20:23] seven figures something like that so all numbers of seven figures hopefully and [00:20:26] numbers of seven figures hopefully and then we can have sentences with like [00:20:28] then we can have sentences with like blank spots where we insert this and we [00:20:31] blank spots where we insert this and we can generate a lot of data using this [00:20:34] can generate a lot of data using this insertion scheme automated and every [00:20:36] insertion scheme automated and every time we insert we can label we're going [00:20:38] time we insert we can label we're going to see that [00:20:41] I like this idea as well use a part of [00:20:45] I like this idea as well use a part of speech tagger identity recognition model [00:20:47] speech tagger identity recognition model to identify examples requests that are [00:20:48] to identify examples requests that are found elsewhere [00:20:49] found elsewhere so one thing we discussed in section is [00:20:53] so one thing we discussed in section is that you have available models to do [00:20:56] that you have available models to do part of speech tagging right so why [00:20:59] part of speech tagging right so why don't we use them these are trained [00:21:00] don't we use them these are trained really well and we could give our user [00:21:03] really well and we could give our user utterances that we collected online and [00:21:07] utterances that we collected online and tagged them automatically using these [00:21:09] tagged them automatically using these good models of course it's not going to [00:21:11] good models of course it's not going to be perfect but we can at least get [00:21:13] be perfect but we can at least get started with that and leverage a model [00:21:16] started with that and leverage a model that someone else has built to tag and [00:21:19] that someone else has built to tag and label our dataset okay good ideas here [00:21:28] so let's see the data generation process [00:21:32] so let's see the data generation process which is the most strategy to start with [00:21:34] which is the most strategy to start with I would say we would have talking about [00:21:39] I would say we would have talking about the flight booking Virtual Assistants we [00:21:43] the flight booking Virtual Assistants we would have a database of all the [00:21:44] would have a database of all the departure locations so whatever [00:21:48] departure locations so whatever Paris London Kuala Lumpur and a lot of [00:21:52] Paris London Kuala Lumpur and a lot of arrivals as well so these are lists of [00:21:55] arrivals as well so these are lists of cities that have airports probably in [00:21:57] cities that have airports probably in the world and we will have a list of way [00:22:01] the world and we will have a list of way to write days and also class business [00:22:04] to write days and also class business echo echo plus premium I don't know [00:22:07] echo echo plus premium I don't know whatever you want and user occurrences [00:22:10] whatever you want and user occurrences and then what we will do is that we will [00:22:11] and then what we will do is that we will pull a user a trends from the database [00:22:14] pull a user a trends from the database such as this one I would like to book a [00:22:16] such as this one I would like to book a flight from depth to arrival for in in [00:22:22] flight from depth to arrival for in in business class let's say in class for [00:22:24] business class let's say in class for this day and then we can plug in from [00:22:28] this day and then we can plug in from dataset randomly the slots that make [00:22:34] dataset randomly the slots that make sense we can generate a lot of data [00:22:35] sense we can generate a lot of data using this process so this user [00:22:37] using this process so this user utterance can be augmented in virtually [00:22:42] utterance can be augmented in virtually tens or hundreds of different [00:22:44] tens or hundreds of different combinations [00:22:49] so that's one way to augment your data [00:22:51] so that's one way to augment your data set automatically and label it but you [00:22:53] set automatically and label it but you also need hand labelled data because you [00:22:57] also need hand labelled data because you don't want your model to overfit to this [00:22:59] don't want your model to overfit to this specific type of user utterances okay [00:23:04] specific type of user utterances okay and so on so same for our virtual [00:23:09] and so on so same for our virtual assistant for the for the university hi [00:23:12] assistant for the for the university hi I want to enroll in code for a quarter [00:23:14] I want to enroll in code for a quarter year and then we can insert from the [00:23:17] year and then we can insert from the database the quarter the year and the [00:23:19] database the quarter the year and the code of different classes so that we can [00:23:22] code of different classes so that we can train our network on that does this [00:23:25] train our network on that does this state augmentation make sense so these [00:23:29] state augmentation make sense so these are common tricks you would seen in [00:23:30] are common tricks you would seen in various papers and this is an example of [00:23:33] various papers and this is an example of one of them okay so we can label [00:23:37] one of them okay so we can label automatically when inserting and we can [00:23:39] automatically when inserting and we can train a sequence to sequence model in [00:23:41] train a sequence to sequence model in order to fill in the slots okay so let's [00:23:47] order to fill in the slots okay so let's go on menti and start the competition [00:23:50] go on menti and start the competition which is the the most fun okay [00:23:54] which is the the most fun okay so let's get back to to to where we were [00:23:57] so let's get back to to to where we were we have a chat bot that is able to [00:23:59] we have a chat bot that is able to answer for sure I just enrolled you the [00:24:01] answer for sure I just enrolled you the way it does that is that it receives the [00:24:03] way it does that is that it receives the user a chance I want to enroll in CS 106 [00:24:05] user a chance I want to enroll in CS 106 a winter 2019 to learn coding it [00:24:08] a winter 2019 to learn coding it identifies the intent of the user using [00:24:11] identifies the intent of the user using sequence classifier same type of network [00:24:14] sequence classifier same type of network as you've built for the mo GFI [00:24:15] as you've built for the mo GFI assignment and then it also runs another [00:24:19] assignment and then it also runs another algorithm which will fill in the slots [00:24:21] algorithm which will fill in the slots and here we have all the slots needed we [00:24:25] and here we have all the slots needed we have the code for the class we have the [00:24:26] have the code for the class we have the quarter and we have the year thus unit [00:24:28] quarter and we have the year thus unit ID is implicitly given so we're able to [00:24:31] ID is implicitly given so we're able to enroll to enroll the students by calling [00:24:33] enroll to enroll the students by calling access with all these slots done now [00:24:36] access with all these slots done now let's make it a little more complicated [00:24:38] let's make it a little more complicated let's say the students say hi I want to [00:24:40] let's say the students say hi I want to enroll in CS 106 a 2 to learn coding so [00:24:45] enroll in CS 106 a 2 to learn coding so the difference between these utterance [00:24:47] the difference between these utterance and the previous one example one is that [00:24:49] and the previous one example one is that you don't have all the slots you [00:24:52] you don't have all the slots you identify with your slots tagger that's [00:24:55] identify with your slots tagger that's CS 106 a is the coder of the class but [00:24:57] CS 106 a is the coder of the class but you don't know the culture you don't [00:24:59] you don't know the culture you don't know the year so you probably want your [00:25:01] know the year so you probably want your chat bot to get back to the to the [00:25:02] chat bot to get back to the to the student and say for which quarter would [00:25:04] student and say for which quarter would you like to enroll right and the student [00:25:08] you like to enroll right and the student would hopefully say winter 2019 or [00:25:10] would hopefully say winter 2019 or winter and then you have to ask for the [00:25:12] winter and then you have to ask for the year 2019 and finally you can say for [00:25:15] year 2019 and finally you can say for sure I just enrolled you so we're not [00:25:18] sure I just enrolled you so we're not making any assumption here on natural [00:25:19] making any assumption here on natural language generation you've worked on a [00:25:21] language generation you've worked on a Shakespeare assignment where you [00:25:23] Shakespeare assignment where you generate Shakespeare like sentences in [00:25:25] generate Shakespeare like sentences in fact a good shot boat would have this [00:25:28] fact a good shot boat would have this feature of generating language but for [00:25:30] feature of generating language but for our purpose which can just hard code [00:25:32] our purpose which can just hard code that when you're able to enroll the [00:25:33] that when you're able to enroll the students you just say I just enrolled [00:25:35] students you just say I just enrolled you when you were able to retrieve [00:25:37] you when you were able to retrieve information from the students you would [00:25:38] information from the students you would just write here is some information and [00:25:40] just write here is some information and you would plug in whatever the explore [00:25:42] you would plug in whatever the explore course is API sent back in a JSON okay [00:25:46] course is API sent back in a JSON okay so here the idea is this student [00:25:49] so here the idea is this student utterance cannot be understood without [00:25:52] utterance cannot be understood without context there is no way to understand [00:25:54] context there is no way to understand winter 2019 if you don't have a context [00:25:58] winter 2019 if you don't have a context management system does it make sense so [00:26:02] management system does it make sense so we want to build that context management [00:26:03] we want to build that context management system and then the question is how to [00:26:08] system and then the question is how to handle context so there is a there's [00:26:10] handle context so there is a there's many there are many ways to do that and [00:26:12] many there are many ways to do that and people are still searching for the best [00:26:13] people are still searching for the best ways one way is to handle it with [00:26:16] ways one way is to handle it with reinforcement learning as you mentioned [00:26:17] reinforcement learning as you mentioned earlier another way which is quite [00:26:20] earlier another way which is quite intuitive and and closer to what we've [00:26:22] intuitive and and closer to what we've seen together in sequence model in the [00:26:25] seen together in sequence model in the module in the module five is this type [00:26:28] module in the module five is this type of architectures which is which is taken [00:26:31] of architectures which is which is taken from Chen a tall and twin memory [00:26:33] from Chen a tall and twin memory networks with knowledge carryover for [00:26:35] networks with knowledge carryover for multi turn spoken language understanding [00:26:37] multi turn spoken language understanding so now you're able to understand what [00:26:39] so now you're able to understand what multi-turn means and end-to-end memory [00:26:41] multi-turn means and end-to-end memory network so what happens here just to [00:26:42] network so what happens here just to describe it is we will save all the [00:26:46] describe it is we will save all the history occurrences it means from the [00:26:48] history occurrences it means from the beginning of the conversation we will [00:26:49] beginning of the conversation we will record all the utterances and messages [00:26:52] record all the utterances and messages exchanged between the user and the [00:26:55] exchanged between the user and the assistant we will keep it in a storage [00:26:58] assistant we will keep it in a storage that will be call history utterances see [00:27:01] that will be call history utterances see is the current accounts so let's say the [00:27:04] is the current accounts so let's say the student says winter 2019 this is the [00:27:07] student says winter 2019 this is the utterance of the student at this point [00:27:09] utterance of the student at this point we will run this see and of course like [00:27:14] we will run this see and of course like it's its [00:27:15] it's its these entrants will be run into an RNN [00:27:17] these entrants will be run into an RNN and we will get back to an encoding of [00:27:20] and we will get back to an encoding of this sentence so there's all the like [00:27:22] this sentence so there's all the like word embedding stuff that I don't [00:27:24] word embedding stuff that I don't describe but your guys are used to it so [00:27:26] describe but your guys are used to it so we use word embeddings we run it - we [00:27:29] we use word embeddings we run it - we run it to an RNN and we get back the [00:27:31] run it to an RNN and we get back the encoding of the user utterance and this [00:27:34] encoding of the user utterance and this encoding will then be compared to what [00:27:36] encoding will then be compared to what we have in memory so all the user [00:27:38] we have in memory so all the user utterances that we had in memory are [00:27:40] utterances that we had in memory are also going to be run in an RNN that will [00:27:42] also going to be run in an RNN that will encode their information in vectors [00:27:45] encode their information in vectors these vectors are going to be put in a [00:27:49] these vectors are going to be put in a memory representation and are you will [00:27:55] memory representation and are you will be directly inner product we will have [00:27:57] be directly inner product we will have an inner product from ru with all the [00:27:59] an inner product from ru with all the memories and this pooled into a soft Max [00:28:02] memories and this pooled into a soft Max will give us a vector of attention that [00:28:05] will give us a vector of attention that you guys should be used to now a [00:28:07] you guys should be used to now a knowledge attention distribution telling [00:28:10] knowledge attention distribution telling us what's the relation where should we [00:28:12] us what's the relation where should we put our attention in the memory for this [00:28:15] put our attention in the memory for this specific utterance that make sense so [00:28:20] specific utterance that make sense so simple inner product soft max gives us a [00:28:23] simple inner product soft max gives us a series of weights here ok then we get [00:28:29] series of weights here ok then we get awaited sum of all these attention [00:28:31] awaited sum of all these attention weights multiplied by the memory and it [00:28:34] weights multiplied by the memory and it gives us a vector that encodes the [00:28:35] gives us a vector that encodes the relevance of the memory regarding our [00:28:38] relevance of the memory regarding our current utterance this is then summed [00:28:42] current utterance this is then summed and run into a simple matrix [00:28:46] and run into a simple matrix multiplication to get an output vector [00:28:49] multiplication to get an output vector which would be run in a slot stagnant [00:28:50] which would be run in a slot stagnant sequence and usually it's experimental [00:28:52] sequence and usually it's experimental but they pass also the current utterance [00:28:56] but they pass also the current utterance to the RN and tiger and the orion tiger [00:28:58] to the RN and tiger and the orion tiger comes up with a slot tagging [00:29:00] comes up with a slot tagging so using that you can understand that [00:29:02] so using that you can understand that winter 2019 is actually the target for [00:29:05] winter 2019 is actually the target for the slots quarter and here because you [00:29:09] the slots quarter and here because you have this memory network does it make [00:29:12] have this memory network does it make sense so this is another type of [00:29:17] sense so this is another type of attention models you want to use and [00:29:18] attention models you want to use and this memory network sim can be used to [00:29:20] this memory network sim can be used to manage some contexts for the slots [00:29:22] manage some contexts for the slots tagger okay [00:29:27] tagger okay so just to recap we have our example hi [00:29:30] so just to recap we have our example hi I want to enroll in a class and we [00:29:31] I want to enroll in a class and we detect the intents which is enrolled we [00:29:35] detect the intents which is enrolled we also detect that there are some slots [00:29:36] also detect that there are some slots missing because we know we know that the [00:29:39] missing because we know we know that the enroll function needs the court earlier [00:29:41] enroll function needs the court earlier and the class in order to be able to be [00:29:44] and the class in order to be able to be called so we have to ask for those so we [00:29:47] called so we have to ask for those so we probably hard-coded the fact that if you [00:29:49] probably hard-coded the fact that if you don't have the quarter the Year and the [00:29:51] don't have the quarter the Year and the class you probably want to first ask for [00:29:53] class you probably want to first ask for the class or the quarter or the year [00:29:56] the class or the quarter or the year then you can you can get back to the [00:29:59] then you can you can get back to the person by asking which class you want to [00:30:00] person by asking which class you want to enroll in the person would get back to [00:30:03] enroll in the person would get back to you you will use your memory network to [00:30:05] you you will use your memory network to understand that CS 230 is a slots for [00:30:09] understand that CS 230 is a slots for the enroll in tenth you would fill it in [00:30:13] the enroll in tenth you would fill it in so now we have our intent with the class [00:30:14] so now we have our intent with the class equals CS 230 and we have our slots [00:30:17] equals CS 230 and we have our slots quarter in year which are to be filled [00:30:19] quarter in year which are to be filled the chat BOTS get bags for which quarter [00:30:22] the chat BOTS get bags for which quarter and hopefully the student gives you the [00:30:23] and hopefully the student gives you the year at the same time and you can fill [00:30:25] year at the same time and you can fill in the slots and then you are enrolled [00:30:29] in the slots and then you are enrolled in CS 234 winter 2019 yeah should be [00:30:35] in CS 234 winter 2019 yeah should be spring yeah this shot boy is not trained [00:30:38] spring yeah this shot boy is not trained very well okay any questions on that so [00:30:44] very well okay any questions on that so this is a very simple case of a [00:30:47] this is a very simple case of a conversational assistant just to give [00:30:49] conversational assistant just to give you some ideas there are some paper [00:30:50] you some ideas there are some paper listed in the presentation that you can [00:30:52] listed in the presentation that you can go to in order to get more advanced [00:30:55] go to in order to get more advanced research insights but the idea here is [00:31:00] research insights but the idea here is that we're limited to a specific intent [00:31:02] that we're limited to a specific intent to two specific intents and a few slots [00:31:04] to two specific intents and a few slots what do you think we would need if we [00:31:07] what do you think we would need if we didn't restrict ourselves to specific [00:31:09] didn't restrict ourselves to specific intents and slots [00:31:18] [Applause] [00:31:24] it's a very complicated tough one [00:31:29] it's a very complicated tough one industrial way to do it is to use a [00:31:31] industrial way to do it is to use a knowledge graph what it means is let's [00:31:35] knowledge graph what it means is let's say you're an e-commerce platform you [00:31:37] say you're an e-commerce platform you probably have from your platform a [00:31:40] probably have from your platform a knowledge graph oil of all the items on [00:31:42] knowledge graph oil of all the items on the platform with connections among them [00:31:45] the platform with connections among them like let's say color off let's say you [00:31:48] like let's say color off let's say you have a shoe a shoe is a slot that might [00:31:52] have a shoe a shoe is a slot that might be the object for the intents I want to [00:31:55] be the object for the intents I want to buy something right the shoe can have [00:31:58] buy something right the shoe can have several attributes like color or size or [00:32:01] several attributes like color or size or men or women like gender and all these [00:32:04] men or women like gender and all these are connected together in a gem in in in [00:32:07] are connected together in a gem in in in a gigantic knowledge graph and you will [00:32:10] a gigantic knowledge graph and you will follow the path of this knowledge graph [00:32:12] follow the path of this knowledge graph following some probabilistic [00:32:14] following some probabilistic probabilities so when we detect the [00:32:17] probabilities so when we detect the intent of the user which is by something [00:32:20] intent of the user which is by something we could identify the object I want to [00:32:24] we could identify the object I want to buy a shoe and then based on our [00:32:26] buy a shoe and then based on our knowledge graph it says that the next [00:32:27] knowledge graph it says that the next question that we should ask or the next [00:32:29] question that we should ask or the next slots that we need to feel is which [00:32:32] slots that we need to feel is which brand do you want your shoe to be and so [00:32:35] brand do you want your shoe to be and so the knowledge graph is going to tell you [00:32:36] the knowledge graph is going to tell you with 60% probability go to brand and ask [00:32:40] with 60% probability go to brand and ask about the brand [00:32:41] about the brand once you're there what other information [00:32:43] once you're there what other information you need in order to be able to retrieve [00:32:45] you need in order to be able to retrieve five results for the user to review and [00:32:49] five results for the user to review and so on so the knowledge graph is [00:32:51] so on so the knowledge graph is something in this field that can be used [00:32:52] something in this field that can be used in order to have multiple intense [00:32:55] in order to have multiple intense multiple slots for every intent okay and [00:33:00] multiple slots for every intent okay and at the end we can make an API call here [00:33:02] at the end we can make an API call here with CS 2:30 quarter winter 2019 quarter [00:33:05] with CS 2:30 quarter winter 2019 quarter winter year 2019 and the Sui D okay [00:33:11] winter year 2019 and the Sui D okay another question I had for you I've had [00:33:15] another question I had for you I've had for you I have for you is how to [00:33:17] for you I have for you is how to evaluate the performance of a chat bot [00:33:20] evaluate the performance of a chat bot what do you think of that [00:33:33] so there are common ways to to evaluate [00:33:36] so there are common ways to to evaluate several part of your pipeline like how [00:33:39] several part of your pipeline like how is your slot Tiger doing how is your [00:33:42] is your slot Tiger doing how is your intent classifier do you can use metrics [00:33:44] intent classifier do you can use metrics such as precision and recall f1 score [00:33:48] such as precision and recall f1 score for the mix of both and report those in [00:33:51] for the mix of both and report those in order to compare how this module is [00:33:53] order to compare how this module is doing for the chat bot but ultimately [00:33:57] doing for the chat bot but ultimately you want to understand how good is your [00:33:59] you want to understand how good is your chat bot overall so some experiments are [00:34:02] chat bot overall so some experiments are done and this is a paper of a deep [00:34:04] done and this is a paper of a deep reinforcement learning chat bot built in [00:34:06] reinforcement learning chat bot built in 2017 by the millah serve an adult and [00:34:10] 2017 by the millah serve an adult and what they did is that they used [00:34:12] what they did is that they used Mechanical Turk in order to evaluate [00:34:14] Mechanical Turk in order to evaluate their chat BOTS and also build a scoring [00:34:16] their chat BOTS and also build a scoring system for their reinforcement learning [00:34:18] system for their reinforcement learning chat bot so I'm reading for you the [00:34:20] chat bot so I'm reading for you the instructions you will be presented with [00:34:22] instructions you will be presented with a conversation between two speakers [00:34:24] a conversation between two speakers speaker a and B you will also be [00:34:26] speaker a and B you will also be presented with four potential responses [00:34:28] presented with four potential responses from one of the speakers for this [00:34:30] from one of the speakers for this dialogue and the task is for you to rate [00:34:32] dialogue and the task is for you to rate each of the responses between one [00:34:36] each of the responses between one inappropriate doesn't make sense [00:34:38] inappropriate doesn't make sense to five highly appropriate and [00:34:40] to five highly appropriate and interesting based on how appropriate the [00:34:42] interesting based on how appropriate the response is to continue the conversation [00:34:44] response is to continue the conversation three is neutral and if two responses [00:34:49] three is neutral and if two responses are equally appropriate you should give [00:34:51] are equally appropriate you should give them the same score and if you see [00:34:53] them the same score and if you see response that is not in English please [00:34:55] response that is not in English please give a one score so here is what happens [00:34:58] give a one score so here is what happens from a user perspective you would have a [00:35:01] from a user perspective you would have a conversation you need to work on your [00:35:03] conversation you need to work on your English why do you say that about me [00:35:07] English why do you say that about me well your English is very poor so this [00:35:11] well your English is very poor so this is the conversation and then the [00:35:13] is the conversation and then the response one is but English is my native [00:35:15] response one is but English is my native language response to is what other [00:35:18] language response to is what other reasons come to mind response three is [00:35:20] reasons come to mind response three is here is a funny fact go is the shortest [00:35:26] here is a funny fact go is the shortest complete sentence in the English [00:35:27] complete sentence in the English language and then the fourth response is [00:35:30] language and then the fourth response is by doggy so obviously you have to you [00:35:35] by doggy so obviously you have to you have to score you have to score these [00:35:38] have to score you have to score these these responses according to what you [00:35:41] these responses according to what you think how relevant they are and then [00:35:44] think how relevant they are and then and then these scores will be used [00:35:47] and then these scores will be used either for the scoring system of the [00:35:49] either for the scoring system of the deep reinforcement learning chat bot or [00:35:51] deep reinforcement learning chat bot or it can be used to evaluate how good is [00:35:53] it can be used to evaluate how good is your chat bot compare to other channels [00:35:54] your chat bot compare to other channels so maybe each of these responds come [00:35:56] so maybe each of these responds come from a different model does that make [00:36:00] from a different model does that make sense so these are a few ways there [00:36:06] sense so these are a few ways there another way which is asking for the [00:36:08] another way which is asking for the opinion of the user on different [00:36:11] opinion of the user on different responses so let's say you you're a user [00:36:14] responses so let's say you you're a user and you are you are comparing to chat [00:36:19] and you are you are comparing to chat BOTS you can give your opinion on which [00:36:21] BOTS you can give your opinion on which one you think is more natural and you [00:36:23] one you think is more natural and you would ask a lot of users to do that to [00:36:25] would ask a lot of users to do that to compare two or three chat BOTS together [00:36:27] compare two or three chat BOTS together and also compare them to natural [00:36:29] and also compare them to natural language from a human and then by doing [00:36:32] language from a human and then by doing a lot of mean opinion score experiments [00:36:36] a lot of mean opinion score experiments you can evaluate which chat bots are [00:36:38] you can evaluate which chat bots are better than the others just comparing [00:36:40] better than the others just comparing them one-on-one okay now getting back to [00:36:46] them one-on-one okay now getting back to one thing that the student mentioned [00:36:49] one thing that the student mentioned earlier is what if we want to have a [00:36:50] earlier is what if we want to have a vocal assistant so right now our [00:36:52] vocal assistant so right now our assistant is not vocal it's just text [00:36:54] assistant is not vocal it's just text what other things do we need to build in [00:36:57] what other things do we need to build in order to make it a vocal assistant we're [00:37:04] order to make it a vocal assistant we're not going to go into in the details but [00:37:06] not going to go into in the details but roughly you would need a speech-to-text [00:37:09] roughly you would need a speech-to-text system which will take the voice of a [00:37:13] system which will take the voice of a user convert it into a text and this as [00:37:15] user convert it into a text and this as you've seen in the sequence model class [00:37:17] you've seen in the sequence model class has different step in the pipeline and [00:37:21] has different step in the pipeline and the speech to text so any text to speech [00:37:23] the speech to text so any text to speech that takes the text from the chat pod [00:37:25] that takes the text from the chat pod and convert it into a voice so that's [00:37:28] and convert it into a voice so that's how you have like virtual assistants [00:37:30] how you have like virtual assistants talking to us is because they have a [00:37:32] talking to us is because they have a text-to-speech system running and these [00:37:34] text-to-speech system running and these are three papers the first one is this [00:37:36] are three papers the first one is this speech - from values team which built an [00:37:40] speech - from values team which built an end-to-end speech recognition in English [00:37:41] end-to-end speech recognition in English and Mandarin and the two others are text [00:37:44] and Mandarin and the two others are text to speech synthesis so one came up in [00:37:47] to speech synthesis so one came up in February 2018 which is the tacit Ron - [00:37:50] February 2018 which is the tacit Ron - and the second one is wavenet which is a [00:37:52] and the second one is wavenet which is a very popular generative models and these [00:37:55] very popular generative models and these are these are far beyond the scope [00:37:57] are these are far beyond the scope of the class but you can study them in [00:38:01] of the class but you can study them in other classes at Stanford which are more [00:38:03] other classes at Stanford which are more specific to speech okay class project [00:38:07] specific to speech okay class project advice so this Friday we're going to go [00:38:09] advice so this Friday we're going to go over again the rubrics of what we look [00:38:14] over again the rubrics of what we look at when we when we great projects and [00:38:16] at when we when we great projects and here is the list of things we would look [00:38:18] here is the list of things we would look at so make sure you have a very good [00:38:22] at so make sure you have a very good problem description when you read papers [00:38:23] problem description when you read papers you see that there is a very good [00:38:25] you see that there is a very good abstract we expect you to give us a very [00:38:26] abstract we expect you to give us a very good abstract so that when we read it we [00:38:28] good abstract so that when we read it we get a good understanding of the paper [00:38:30] get a good understanding of the paper hyper parameter tuning always report [00:38:33] hyper parameter tuning always report what you do you don't need to to be very [00:38:35] what you do you don't need to to be very exhaustive but but you can just tell us [00:38:38] exhaustive but but you can just tell us what hyper parameters you've been [00:38:40] what hyper parameters you've been choosing and which ones you've been [00:38:41] choosing and which ones you've been testing and why they didn't work the [00:38:45] testing and why they didn't work the right thing we look for typos this is [00:38:47] right thing we look for typos this is common in the grading scheme typos a [00:38:50] common in the grading scheme typos a clear language so review it peer review [00:38:54] clear language so review it peer review your paper explanation of choice in this [00:38:57] your paper explanation of choice in this unit this is a very important part we [00:38:59] unit this is a very important part we expect you to explain the decisions [00:39:01] expect you to explain the decisions you're making so we don't want you to to [00:39:04] you're making so we don't want you to to tell us I've taken I've made that [00:39:06] tell us I've taken I've made that decision just without explaining but [00:39:09] decision just without explaining but rather tell us there is this paper that [00:39:10] rather tell us there is this paper that mentioned that this architecture worked [00:39:13] mentioned that this architecture worked well on that specific task I've tried [00:39:16] well on that specific task I've tried three architectures here are my hyper [00:39:18] three architectures here are my hyper parameters and results that's why I'm [00:39:20] parameters and results that's why I'm gonna I'm going to dig more into that [00:39:22] gonna I'm going to dig more into that one and so on data cleaning and [00:39:24] one and so on data cleaning and pre-processing if applicable to your [00:39:26] pre-processing if applicable to your project explain it how much code you [00:39:29] project explain it how much code you wrote on your own it's important to us [00:39:31] wrote on your own it's important to us and please submit your github or [00:39:33] and please submit your github or privately to the TAS when you submit [00:39:36] privately to the TAS when you submit your projects gonna make it easier for [00:39:38] your projects gonna make it easier for us to review the code insights and [00:39:41] us to review the code insights and discussions include the next steps what [00:39:44] discussions include the next steps what would you have done if you had more time [00:39:46] would you have done if you had more time and also interpret your results don't [00:39:48] and also interpret your results don't just give results without explanation [00:39:51] just give results without explanation but rather try to extract information [00:39:53] but rather try to extract information from these results and you can also [00:39:56] from these results and you can also drive your next steps explanation [00:39:59] drive your next steps explanation results are important but if you don't [00:40:02] results are important but if you don't have the results you expected it's fine [00:40:04] have the results you expected it's fine we will look at how much work you've [00:40:05] we will look at how much work you've done and some tasks are very complicated [00:40:07] done and some tasks are very complicated we don't expect you to beat [00:40:09] we don't expect you to beat state-of-the-art on every single [00:40:10] state-of-the-art on every single as some of you are going to be [00:40:12] as some of you are going to be state-of-the-art hopefully but those of [00:40:15] state-of-the-art hopefully but those of you who didn't still report all your [00:40:17] you who didn't still report all your results and explained why it didn't work [00:40:19] results and explained why it didn't work give references and also penalty for [00:40:23] give references and also penalty for more than five pages so if you're [00:40:24] more than five pages so if you're working on a on a theoretical project [00:40:28] working on a on a theoretical project you can add additional pages as appendix [00:40:31] you can add additional pages as appendix you can also add appendix for your [00:40:33] you can also add appendix for your project but the core has to be five [00:40:36] project but the core has to be five pages and for the final poster [00:40:39] pages and for the final poster presentation which will happen not this [00:40:41] presentation which will happen not this Friday next one we will ask you to pitch [00:40:44] Friday next one we will ask you to pitch your project in three minutes [00:40:46] your project in three minutes so not everyone in the group has to talk [00:40:48] so not everyone in the group has to talk but at least one person has to talk in [00:40:50] but at least one person has to talk in and we prefer if several of you talk in [00:40:53] and we prefer if several of you talk in the project but you have three minutes [00:40:55] the project but you have three minutes to pitch your project so prepare the [00:40:56] to pitch your project so prepare the pitch in advance and you will have two [00:40:59] pitch in advance and you will have two minutes of questions from the TA which [00:41:01] minutes of questions from the TA which are also part of the grade okay finally [00:41:06] are also part of the grade okay finally what's next after CS 2:30 so there's a [00:41:08] what's next after CS 2:30 so there's a ton of class at Stanford we're in a good [00:41:10] ton of class at Stanford we're in a good learning environment which is just super [00:41:13] learning environment which is just super next steps can be in the university [00:41:17] next steps can be in the university classes you can take in natural language [00:41:18] classes you can take in natural language processing and computer vision but also [00:41:22] processing and computer vision but also classes from different departments deep [00:41:26] classes from different departments deep generative models is a good way to learn [00:41:28] generative models is a good way to learn about text to speech for example or gans [00:41:32] about text to speech for example or gans probably see graphical models is also [00:41:34] probably see graphical models is also very important class in the industry s [00:41:36] very important class in the industry s Department of course if you haven't [00:41:38] Department of course if you haven't taken it yet CS two to nine machine [00:41:40] taken it yet CS two to nine machine learning or CS two to nine a applied [00:41:42] learning or CS two to nine a applied machine learning or to go to to learn [00:41:45] machine learning or to go to to learn machine learning reinforcement learning [00:41:47] machine learning reinforcement learning is a class where you can you can delve [00:41:49] is a class where you can you can delve more into Q learning policy gradients [00:41:52] more into Q learning policy gradients and all these methods that sometime use [00:41:55] and all these methods that sometime use deep learning so we're going to publish [00:41:58] deep learning so we're going to publish that list in case you want to check it [00:42:00] that list in case you want to check it but these are examples of classes you [00:42:02] but these are examples of classes you can take and of course there are other [00:42:03] can take and of course there are other classes that tournament not mention here [00:42:05] classes that tournament not mention here that might be relevant to pursue your [00:42:08] that might be relevant to pursue your learning in in deep deep learning in [00:42:10] learning in in deep deep learning in machine learning [00:42:12] machine learning okay that said I'm going to to give the [00:42:15] okay that said I'm going to to give the microphone to Andrew for closing remarks [00:42:17] microphone to Andrew for closing remarks and [00:42:19] and yeah good luck on your projects so we'll [00:42:23] yeah good luck on your projects so we'll see you on Friday for the discussion [00:42:25] see you on Friday for the discussion sections and next week for the final [00:42:27] sections and next week for the final project charlie microphone oh so all [00:42:39] project charlie microphone oh so all right here we are at the end of this [00:42:41] right here we are at the end of this class nearly at the end of this class um [00:42:45] class nearly at the end of this class um you know dear new Rip's conference is [00:42:49] you know dear new Rip's conference is taking place right now formerly the nips [00:42:51] taking place right now formerly the nips conference of a renamed to new ribs and [00:42:53] conference of a renamed to new ribs and I remember it was ten years ago that at [00:42:57] I remember it was ten years ago that at that time a piece tune in the diet [00:42:58] that time a piece tune in the diet bridge a trainer presents the paper [00:43:01] bridge a trainer presents the paper workshop paper at nips telling people [00:43:03] workshop paper at nips telling people hey consider using GPUs and crew there [00:43:06] hey consider using GPUs and crew there which is a new thing that Nvidia I just [00:43:08] which is a new thing that Nvidia I just published to train your networks and [00:43:10] published to train your networks and we've done that work on a GPU server [00:43:14] we've done that work on a GPU server that Ian could fellow the creator of [00:43:16] that Ian could fellow the creator of Gans have built in his dorm room when he [00:43:19] Gans have built in his dorm room when he was an undergrad at Stanford so our [00:43:21] was an undergrad at Stanford so our first few server was built in the [00:43:22] first few server was built in the Stanford undergrads dorm and I remember [00:43:28] Stanford undergrads dorm and I remember sitting down with Jeff Fenton and food [00:43:30] sitting down with Jeff Fenton and food called and saying hey check out the [00:43:31] called and saying hey check out the screw the thing and Jeff said no but GPU [00:43:34] screw the thing and Jeff said no but GPU program is really hard but then but then [00:43:35] program is really hard but then but then but but oh maybe this crew the thing [00:43:37] but but oh maybe this crew the thing looks promising and I tell the story [00:43:41] looks promising and I tell the story because I want you to know as Stanford [00:43:44] because I want you to know as Stanford students that your work can matter right [00:43:47] students that your work can matter right when younger fellow built that GPU [00:43:50] when younger fellow built that GPU server in his dorm room I had no idea if [00:43:54] server in his dorm room I had no idea if he realized that a decade later you know [00:43:56] he realized that a decade later you know someone would be winning several hundred [00:43:58] someone would be winning several hundred hours of AWS credits to try and bigger [00:44:01] hours of AWS credits to try and bigger deep learning algorithms but I think as [00:44:05] deep learning algorithms but I think as Stanford here at Stanford University [00:44:07] Stanford here at Stanford University were very much at the heart of the [00:44:10] were very much at the heart of the technology world [00:44:11] technology world I think Silicon Valley is here to a [00:44:13] I think Silicon Valley is here to a large pot because Stanford University is [00:44:16] large pot because Stanford University is here and we live in a world where with [00:44:20] here and we live in a world where with the superpowers that you now have you [00:44:23] the superpowers that you now have you have a lot of opportunities to do new [00:44:25] have a lot of opportunities to do new and exciting work which may or may not [00:44:27] and exciting work which may or may not seem like your mats in the short run [00:44:29] seem like your mats in the short run maybe even seem constant [00:44:31] maybe even seem constant in the short run be concerti have a huge [00:44:33] in the short run be concerti have a huge impact in the long run as a couple [00:44:37] impact in the long run as a couple weekends ago so um my wife [00:44:40] weekends ago so um my wife we roast coffee beans at home right my [00:44:42] we roast coffee beans at home right my wife buys raw coffee beans and then we [00:44:44] wife buys raw coffee beans and then we actually roast them and camera or my [00:44:46] actually roast them and camera or my wife tends to roast em and its really [00:44:48] wife tends to roast em and its really cheap popcorn popper that we have right [00:44:50] cheap popcorn popper that we have right now so I don't know I don't have much [00:44:53] now so I don't know I don't have much coffee you guys drink I drink a lot of [00:44:54] coffee you guys drink I drink a lot of coffee and so you know so Carol byesies [00:44:57] coffee and so you know so Carol byesies being coffee bean see she puts them in [00:44:59] being coffee bean see she puts them in this like cheap popcorn popper which is [00:45:01] this like cheap popcorn popper which is made for popping popcorn not made for [00:45:03] made for popping popcorn not made for rose and coffee beans this is one of the [00:45:04] rose and coffee beans this is one of the standard cheap ways to roast coffee [00:45:06] standard cheap ways to roast coffee beans and and I love my wife I drink the [00:45:09] beans and and I love my wife I drink the coffee she makes but sometimes she burns [00:45:10] coffee she makes but sometimes she burns the coffee beans so I found this article [00:45:13] the coffee beans so I found this article on the internet from a former student [00:45:16] on the internet from a former student that written an article and how they use [00:45:19] that written an article and how they use machine learning to roast to optimize [00:45:22] machine learning to roast to optimize the roasting of coffee beans as I [00:45:24] the roasting of coffee beans as I forwarded to the Carol she wasn't very [00:45:27] forwarded to the Carol she wasn't very happy about that and but I raised this [00:45:31] happy about that and but I raised this is another example of how all of you you [00:45:37] is another example of how all of you you know I would never have thought of [00:45:38] know I would never have thought of applying machine learning to roasting [00:45:39] applying machine learning to roasting coffee beans it's just I mean you know I [00:45:42] coffee beans it's just I mean you know I like my coffee but it had never occurred [00:45:44] like my coffee but it had never occurred to me to do that but someone taking a [00:45:47] to me to do that but someone taking a machine learning class like you guys are [00:45:50] machine learning class like you guys are go ahead and come up with a better way [00:45:52] go ahead and come up with a better way of roasting coffee beans using learning [00:45:55] of roasting coffee beans using learning algorithms and again I think you I don't [00:45:57] algorithms and again I think you I don't know this picker person that wrote this [00:45:59] know this picker person that wrote this blog post was thinking building a [00:46:00] blog post was thinking building a business all of it I I don't know there [00:46:02] business all of it I I don't know there might be a business that they might not [00:46:03] might be a business that they might not or it might be just a fun personal hope [00:46:04] or it might be just a fun personal hope you actually don't know but all of you [00:46:07] you actually don't know but all of you with these skills have that opportunity [00:46:09] with these skills have that opportunity and then again earlier this week was it [00:46:14] and then again earlier this week was it Monday night a group of us we were [00:46:18] Monday night a group of us we were actually in the gates building where a [00:46:21] actually in the gates building where a bunch of students actually room the yeah [00:46:23] bunch of students actually room the yeah for health care boot camp that can [00:46:25] for health care boot camp that can alluded to [00:46:26] alluded to yeah we're going over some to final [00:46:28] yeah we're going over some to final projects for the students and they you [00:46:30] projects for the students and they you have to healthcare boot camp where we're [00:46:33] have to healthcare boot camp where we're working on and I think and I think [00:46:34] working on and I think and I think actually met several people including [00:46:36] actually met several people including Artie right when she first participates [00:46:38] Artie right when she first participates in a much earlier version of that [00:46:40] in a much earlier version of that okay bootcamp secret you can also RT of [00:46:42] okay bootcamp secret you can also RT of others what you interested but they're [00:46:44] others what you interested but they're um one of the masses students I was [00:46:47] um one of the masses students I was working with patients in primary record [00:46:49] working with patients in primary record I think you guys been in this cause he [00:46:51] I think you guys been in this cause he was demoing an app where you could pull [00:46:55] was demoing an app where you could pull up an x-ray film and take a picture with [00:46:58] up an x-ray film and take a picture with your cell phone upload the picture to a [00:47:02] your cell phone upload the picture to a website and have a website you know read [00:47:06] website and have a website you know read the x-ray and suggest the diagnosis for [00:47:09] the x-ray and suggest the diagnosis for our patients most of planning today has [00:47:12] our patients most of planning today has insufficient access to radiology [00:47:14] insufficient access to radiology services there are many countries where [00:47:16] services there are many countries where it costs you three months of salary to [00:47:20] it costs you three months of salary to go and get an x-ray taken and then maybe [00:47:22] go and get an x-ray taken and then maybe try to find the radiologist to read it [00:47:24] try to find the radiologist to read it but most the planet billions of people [00:47:27] but most the planet billions of people on this planet do not have sufficient [00:47:29] on this planet do not have sufficient services and radiology services and [00:47:32] services and radiology services and while the standards in AI for healthcare [00:47:35] while the standards in AI for healthcare bootcamp is still a research project [00:47:36] bootcamp is still a research project actually you record on the checks net [00:47:39] actually you record on the checks net paper won't you answer yeah right yes [00:47:40] paper won't you answer yeah right yes I'll do a shared co-author on all these [00:47:42] I'll do a shared co-author on all these papers it is a game maybe work done here [00:47:45] papers it is a game maybe work done here at Stanford that you know is taking the [00:47:48] at Stanford that you know is taking the first steps to what maybe if we can [00:47:52] first steps to what maybe if we can improve the deep learning algorithms [00:47:53] improve the deep learning algorithms posterity hurdles you know proof safety [00:47:57] posterity hurdles you know proof safety maybe that type of work happening here [00:48:01] maybe that type of work happening here at Stanford doing that for health care [00:48:03] at Stanford doing that for health care maybe that would have a transformative [00:48:05] maybe that would have a transformative effect on how healthcare is run around [00:48:08] effect on how healthcare is run around the world so um [00:48:11] the world so um the skills that you guys now have are [00:48:16] the skills that you guys now have are very unique set of skills they're not [00:48:18] very unique set of skills they're not that many people on the planet [00:48:19] that many people on the planet today that can apply learning algorithms [00:48:22] today that can apply learning algorithms and deep learning arms the way that you [00:48:24] and deep learning arms the way that you can and you can tell a lot idea as you [00:48:27] can and you can tell a lot idea as you learned in this class where you know [00:48:28] learned in this class where you know invented in the last year or two so this [00:48:31] invented in the last year or two so this is just not yet been time for these [00:48:32] is just not yet been time for these ideas even become widespread and if I [00:48:35] ideas even become widespread and if I look at a lot of the most pressing [00:48:36] look at a lot of the most pressing problems facing society be a lack of [00:48:39] problems facing society be a lack of access to health care or um science I [00:48:42] access to health care or um science I spent a lot of times think about climate [00:48:43] spent a lot of times think about climate change and I think if you look at the [00:48:47] change and I think if you look at the the can we improve access to education [00:48:49] the can we improve access to education can we just make whole society run more [00:48:52] can we just make whole society run more efficiently [00:48:53] efficiently I think that all of you have the skills [00:48:56] I think that all of you have the skills to do very unique projects and I hope [00:48:59] to do very unique projects and I hope that as you graduate from this class I'm [00:49:01] that as you graduate from this class I'm sure some of you will great businesses [00:49:03] sure some of you will great businesses may make all the money that's great and [00:49:04] may make all the money that's great and and I hope that all of you will also [00:49:06] and I hope that all of you will also take the unique skills you have to work [00:49:09] take the unique skills you have to work on projects that matter the most to [00:49:11] on projects that matter the most to other people that that help other people [00:49:14] other people that that help other people because if one of you does not take your [00:49:18] because if one of you does not take your skills to do something meaningful then [00:49:20] skills to do something meaningful then there's probably some very meaningful [00:49:21] there's probably some very meaningful project that just no one is working on [00:49:23] project that just no one is working on because I think the number of meaningful [00:49:25] because I think the number of meaningful projects I think actually greatly [00:49:28] projects I think actually greatly exceeds the number of people in the [00:49:29] exceeds the number of people in the world today that are skilled at deep [00:49:30] world today that are skilled at deep learning which is why all of you have a [00:49:33] learning which is why all of you have a unique opportunity to take these [00:49:35] unique opportunity to take these algorithms that you now know about to [00:49:37] algorithms that you now know about to apply to anything from developing novel [00:49:40] apply to anything from developing novel chatbots to improving healthcare - I [00:49:44] chatbots to improving healthcare - I guess my team at landing a is improving [00:49:46] guess my team at landing a is improving manufacturing agriculture also some [00:49:48] manufacturing agriculture also some healthcare to maybe helping with climate [00:49:51] healthcare to maybe helping with climate change to helping with global education [00:49:54] change to helping with global education and any other problems that that really [00:49:58] and any other problems that that really matter so I hope I hope maybe I hope [00:50:01] matter so I hope I hope maybe I hope that all of you go on to to do work that [00:50:04] that all of you go on to to do work that matters and then one last story you know [00:50:09] matters and then one last story you know a few a few months ago now um I got to [00:50:13] a few a few months ago now um I got to drive a tractor right it was very big a [00:50:15] drive a tractor right it was very big a little bit scary it feels like a bigger [00:50:17] little bit scary it feels like a bigger machine then I should be qualified to [00:50:19] machine then I should be qualified to drive it's a huge factor and and it [00:50:23] drive it's a huge factor and and it turns out that when you drive a tractor [00:50:24] turns out that when you drive a tractor so it turns out when you drive a normal [00:50:26] so it turns out when you drive a normal car you know is really clear which way [00:50:28] car you know is really clear which way is up on the steering wheel right here [00:50:30] is up on the steering wheel right here you point the Siringo up and you know [00:50:31] you point the Siringo up and you know your car drives forward for the tractor [00:50:35] your car drives forward for the tractor that I got to drive this huge tractor it [00:50:37] that I got to drive this huge tractor it turns out that dumb as this giant [00:50:39] turns out that dumb as this giant steering wheel and to drive straight the [00:50:42] steering wheel and to drive straight the giant steering wheel was just oriented [00:50:43] giant steering wheel was just oriented at some weird angle and to turn right [00:50:46] at some weird angle and to turn right you turn the clockwise to turn left you [00:50:48] you turn the clockwise to turn left you turn anti-clockwise and that was that [00:50:50] turn anti-clockwise and that was that right so there's a lot of fun and maybe [00:50:53] right so there's a lot of fun and maybe in addition to and and it was just fun [00:50:57] in addition to and and it was just fun you know I drove a tractor made a u-turn [00:51:00] you know I drove a tractor made a u-turn drove back to where started did not hit [00:51:02] drove back to where started did not hit anyone you know there's no accident and [00:51:04] anyone you know there's no accident and they're like [00:51:05] they're like down off this giant rafter and maybe I [00:51:08] down off this giant rafter and maybe I tell that story because I hope that even [00:51:11] tell that story because I hope that even while you are doing this important may [00:51:16] while you are doing this important may be beneficial to other people sense of [00:51:17] be beneficial to other people sense of work I hope I hope you also have fun I [00:51:20] work I hope I hope you also have fun I think that I feel really privileged that [00:51:22] think that I feel really privileged that is a machine learning engineer um I some [00:51:25] is a machine learning engineer um I some days I get to go drive a tractor right [00:51:28] days I get to go drive a tractor right and and I hope that and one of the most [00:51:32] and and I hope that and one of the most exciting things you know I feel like um [00:51:36] exciting things you know I feel like um a lot of the best a lot of biggest [00:51:40] a lot of the best a lot of biggest untapped opportunities for AI like [00:51:43] untapped opportunities for AI like outside the software industry I'm very [00:51:45] outside the software industry I'm very proud of the work that helped you you [00:51:46] proud of the work that helped you you know leaving the Google rain team being [00:51:48] know leaving the Google rain team being AI do and I think more people should do [00:51:50] AI do and I think more people should do that type of work and I think that here [00:51:54] that type of work and I think that here in Silicon Valley many of you will get [00:51:55] in Silicon Valley many of you will get jobs in the tech sector and that's great [00:51:58] jobs in the tech sector and that's great we need more people to do that and I [00:52:00] we need more people to do that and I also think that if you look at all of [00:52:03] also think that if you look at all of human activity the majority of human [00:52:05] human activity the majority of human activity is actually outside the [00:52:06] activity is actually outside the software industry the majority of global [00:52:08] software industry the majority of global GDP growth or global GDP is actually [00:52:12] GDP growth or global GDP is actually outside the software industry and I [00:52:14] outside the software industry and I would just urge you as you are [00:52:15] would just urge you as you are considering what is the most meaningful [00:52:17] considering what is the most meaningful work so consider the software industry [00:52:19] work so consider the software industry but also look outside the software [00:52:21] but also look outside the software industry because I think really the [00:52:23] industry because I think really the biggest untapped opportunities for AI [00:52:25] biggest untapped opportunities for AI lie outside [00:52:26] lie outside I think lie outside the software [00:52:29] I think lie outside the software industry and and we can't have everyone [00:52:31] industry and and we can't have everyone doing the same thing right there's [00:52:33] doing the same thing right there's actually not a healthy plan and if [00:52:35] actually not a healthy plan and if everyone you know works on improved web [00:52:37] everyone you know works on improved web search or improve or even improved [00:52:40] search or improve or even improved healthcare I think we need a world where [00:52:43] healthcare I think we need a world where all of you have these skills share these [00:52:45] all of you have these skills share these skills teach other people why should [00:52:46] skills teach other people why should learn and go out to do this work that [00:52:49] learn and go out to do this work that hopefully affects the software industry [00:52:51] hopefully affects the software industry affects other industries affects profit [00:52:53] affects other industries affects profit nonprofit affects government but uses [00:52:56] nonprofit affects government but uses these AI capabilities to lift up the [00:52:58] these AI capabilities to lift up the whole human race and then finally the [00:53:04] whole human race and then finally the last thing wants to say on behalf of Ken [00:53:06] last thing wants to say on behalf of Ken and me and the whole teaching team is I [00:53:08] and me and the whole teaching team is I wanted to thank you for your hard work [00:53:10] wanted to thank you for your hard work on this cause I know that you know [00:53:13] on this cause I know that you know watching the videos doing the homeworks [00:53:16] watching the videos doing the homeworks on the website me to the tears [00:53:18] on the website me to the tears section you know that many of you have [00:53:22] section you know that many of you have put a lot of work in this cause and it [00:53:24] put a lot of work in this cause and it wasn't so long ago I guess when I was a [00:53:26] wasn't so long ago I guess when I was a student you know staying at home doing [00:53:29] student you know staying at home doing this homework or trying to derive that [00:53:30] this homework or trying to derive that math thing I'd also take some online [00:53:33] math thing I'd also take some online courses myself so it's actually not so [00:53:34] courses myself so it's actually not so long ago that you know I was sitting a [00:53:37] long ago that you know I was sitting a computer much like you kind of watch [00:53:39] computer much like you kind of watch some Coursera videos and then click on [00:53:41] some Coursera videos and then click on this click on that and answer things [00:53:42] this click on that and answer things online and and and III appreciate Ken [00:53:46] online and and and III appreciate Ken and I and the whole teaching team [00:53:47] and I and the whole teaching team appreciate all the hard work you put [00:53:49] appreciate all the hard work you put into this and I hope also that you got a [00:53:53] into this and I hope also that you got a lot out of your hard work and that you [00:53:55] lot out of your hard work and that you will take these rare and unique skills [00:53:58] will take these rare and unique skills you now have to go on and and when you [00:54:00] you now have to go on and and when you drive from Stanford or further oh or for [00:54:03] drive from Stanford or further oh or for the whole viewers I guess for those are [00:54:05] the whole viewers I guess for those are home viewers as was for the in costume [00:54:08] home viewers as was for the in costume viewers that you take these Rascals you [00:54:09] viewers that you take these Rascals you now have them ain't going to do work [00:54:11] now have them ain't going to do work that matters and go on to do working [00:54:13] that matters and go on to do working cause other people so with that I look [00:54:17] cause other people so with that I look forward to seeing all of your projects [00:54:19] forward to seeing all of your projects at the poster session and apologize in [00:54:22] at the poster session and apologize in advance we won't be the really get a [00:54:24] advance we won't be the really get a deep understanding in three minutes we [00:54:25] deep understanding in three minutes we don't worry we do read your project [00:54:27] don't worry we do read your project reports but I look forward to seeing [00:54:29] reports but I look forward to seeing hope you are looking forward also to [00:54:32] hope you are looking forward also to seeing everyone else's work on the [00:54:33] seeing everyone else's work on the poster session boom with that let me [00:54:35] poster session boom with that let me just say on behalf of the enemy and the [00:54:37] just say on behalf of the enemy and the whole teaching team thank you all very [00:54:39] whole teaching team thank you all very much [00:54:41] much [Applause] ================================================================================ LECTURE INDEX.md ================================================================================ CS230 – Deep Learning (Andrew Ng) Playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb Total Videos: 10 Transcripts Downloaded: 10 Failed/No Captions: 0 --- Lectures 1. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 1 - Class Introduction & Logistics, Andrew Ng - Video: [https://www.youtube.com/watch?v=PySo_6S4ZAg](https://www.youtube.com/watch?v=PySo_6S4ZAg) - Transcript: [001_PySo_6S4ZAg.md](001_PySo_6S4ZAg.md) 2. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 2 - Deep Learning Intuition - Video: [https://www.youtube.com/watch?v=AwQHqWyHRpU](https://www.youtube.com/watch?v=AwQHqWyHRpU) - Transcript: [002_AwQHqWyHRpU.md](002_AwQHqWyHRpU.md) 3. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 3 - Full-Cycle Deep Learning Projects - Video: [https://www.youtube.com/watch?v=JUJNGv_sb4Y](https://www.youtube.com/watch?v=JUJNGv_sb4Y) - Transcript: [003_JUJNGv_sb4Y.md](003_JUJNGv_sb4Y.md) 4. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 4 - Adversarial Attacks / GANs - Video: [https://www.youtube.com/watch?v=ANszao6YQuM](https://www.youtube.com/watch?v=ANszao6YQuM) - Transcript: [004_ANszao6YQuM.md](004_ANszao6YQuM.md) 5. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 5 - AI + Healthcare - Video: [https://www.youtube.com/watch?v=IM9ANAbufYM](https://www.youtube.com/watch?v=IM9ANAbufYM) - Transcript: [005_IM9ANAbufYM.md](005_IM9ANAbufYM.md) 6. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 6 - Deep Learning Project Strategy - Video: [https://www.youtube.com/watch?v=G5FNYxbW_Qw](https://www.youtube.com/watch?v=G5FNYxbW_Qw) - Transcript: [006_G5FNYxbW_Qw.md](006_G5FNYxbW_Qw.md) 7. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 7 - Interpretability of Neural Network - Video: [https://www.youtube.com/watch?v=gCJCgQW_LKc](https://www.youtube.com/watch?v=gCJCgQW_LKc) - Transcript: [007_gCJCgQW_LKc.md](007_gCJCgQW_LKc.md) 8. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 - Career Advice / Reading Research Papers - Video: [https://www.youtube.com/watch?v=733m6qBH-jI](https://www.youtube.com/watch?v=733m6qBH-jI) - Transcript: [008_733m6qBH-jI.md](008_733m6qBH-jI.md) 9. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 9 - Deep Reinforcement Learning - Video: [https://www.youtube.com/watch?v=NP2XqpgTJyo](https://www.youtube.com/watch?v=NP2XqpgTJyo) - Transcript: [009_NP2XqpgTJyo.md](009_NP2XqpgTJyo.md) 10. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 10 - Chatbots / Closing Remarks - Video: [https://www.youtube.com/watch?v=IFLstgCNOA4](https://www.youtube.com/watch?v=IFLstgCNOA4) - Transcript: [010_IFLstgCNOA4.md](010_IFLstgCNOA4.md)