================================================================================
LECTURE 001
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 1 - Class Introduction & Logistics, Andrew Ng

Source: https://www.youtube.com/watch?v=PySo_6S4ZAg

---

Transcript

[00:00:06] okay hey everyone morning welcome to CST
[00:00:10] okay hey everyone morning welcome to CST 30 deep learning so many of you know
[00:00:15] 30 deep learning so many of you know that deep learning these days is the
[00:00:18] that deep learning these days is the latest hottest area of computer science
[00:00:21] latest hottest area of computer science or AI arguably deep learning is the
[00:00:23] or AI arguably deep learning is the latest hottest area of you know all the
[00:00:26] latest hottest area of you know all the human activity maybe but this is a cost
[00:00:30] human activity maybe but this is a cost CST 30 deep learning where we hope that
[00:00:32] CST 30 deep learning where we hope that we can help you understand the state of
[00:00:34] we can help you understand the state of the art and become experts at building
[00:00:37] the art and become experts at building and applying deep learning systems
[00:00:40] and applying deep learning systems unlike many Stanford courses this class
[00:00:43] unlike many Stanford courses this class will be more interactive than than
[00:00:45] will be more interactive than than others because this class we often in
[00:00:48] others because this class we often in the flipped classroom format where we'll
[00:00:50] the flipped classroom format where we'll ask you to watch a lot of the videos at
[00:00:53] ask you to watch a lot of the videos at home a lot of the deep learning AI
[00:00:56] home a lot of the deep learning AI content hosted on Coursera does
[00:00:58] content hosted on Coursera does preserving the classroom and discussion
[00:01:00] preserving the classroom and discussion section time but much deeper discussions
[00:01:03] section time but much deeper discussions so to get started let me let me first
[00:01:06] so to get started let me let me first introduce our teaching team so the
[00:01:09] introduce our teaching team so the co-instructors Archaean cotton guru who
[00:01:11] co-instructors Archaean cotton guru who had actually one of the co-creators of
[00:01:14] had actually one of the co-creators of the deep learning specialization the
[00:01:16] the deep learning specialization the atlanta AI content that were using in
[00:01:18] atlanta AI content that were using in this class and the rest of the teaching
[00:01:23] this class and the rest of the teaching team swathi Dube is the cost coordinator
[00:01:27] team swathi Dube is the cost coordinator and she has been working with me and
[00:01:29] and she has been working with me and others on coordinating I guess CSU 30
[00:01:33] others on coordinating I guess CSU 30 also cs2 29 CSU generate to make all of
[00:01:36] also cs2 29 CSU generate to make all of these classes run well and let you have
[00:01:38] these classes run well and let you have a relatively smooth you know experience
[00:01:42] a relatively smooth you know experience Eunice Mori is the cause adviser and
[00:01:45] Eunice Mori is the cause adviser and he'd also worked closely with IANA me in
[00:01:48] he'd also worked closely with IANA me in creating blobby online contents that you
[00:01:50] creating blobby online contents that you use and Eunice is also head ta 462 98
[00:01:55] use and Eunice is also head ta 462 98 which some of you may also be taking and
[00:01:58] which some of you may also be taking and then we have two co-head TAS or RT by
[00:02:02] then we have two co-head TAS or RT by guru who's worked on machine learning
[00:02:03] guru who's worked on machine learning research for a long time and opposition
[00:02:06] research for a long time and opposition oi who is still traveling back I think
[00:02:08] oi who is still traveling back I think and also a large team of TAS that I
[00:02:11] and also a large team of TAS that I think about half about TAS and CS 230
[00:02:14] think about half about TAS and CS 230 had previously ta
[00:02:16] had previously ta schools and their expertise spans
[00:02:19] schools and their expertise spans everything from applying machine
[00:02:21] everything from applying machine learning problems in the health care or
[00:02:23] learning problems in the health care or climate learning or applying deep
[00:02:25] climate learning or applying deep learning to problems it in robotics to
[00:02:29] learning to problems it in robotics to problems in computational biology to
[00:02:31] problems in computational biology to problems in so I hope that as you work
[00:02:34] problems in so I hope that as you work on your projects this quarter as policy
[00:02:36] on your projects this quarter as policy as 2:30 you'll be able to get a lot of
[00:02:39] as 2:30 you'll be able to get a lot of great advice and help and mentorship
[00:02:42] great advice and help and mentorship from all of the tiers as well so the
[00:02:46] from all of the tiers as well so the plan for today is I was going to spend
[00:02:49] plan for today is I was going to spend maybe the little bit of time sharing
[00:02:52] maybe the little bit of time sharing with you what's happening in deep
[00:02:54] with you what's happening in deep learning why you know why deep learning
[00:02:56] learning why you know why deep learning is taking off and how this might affect
[00:02:58] is taking off and how this might affect your careers and then in the second half
[00:03:01] your careers and then in the second half I have Ken will take over and talk a bit
[00:03:04] I have Ken will take over and talk a bit more about the projects you work on in
[00:03:07] more about the projects you work on in this class and not just a final term
[00:03:08] this class and not just a final term project but you know the little machine
[00:03:11] project but you know the little machine translation system you build the face
[00:03:12] translation system you build the face recognition system you build their our
[00:03:14] recognition system you build their our generation system you build their all of
[00:03:15] generation system you build their all of the many pretty cool machine learning
[00:03:18] the many pretty cool machine learning deep learning applications I mean you
[00:03:19] deep learning applications I mean you get to build throughout the course of
[00:03:21] get to build throughout the course of this quarter and also share view the the
[00:03:24] this quarter and also share view the the detailed logistics for the plan for the
[00:03:27] detailed logistics for the plan for the class ok so I think that let's see all
[00:03:33] class ok so I think that let's see all right I'm going to just use the
[00:03:35] right I'm going to just use the whiteboard for this part so um
[00:03:47] you know deep learning right you know
[00:03:51] you know deep learning right you know seems like the media still can't stop
[00:03:52] seems like the media still can't stop talking about it
[00:03:53] talking about it and it turns out that a lot of the ideas
[00:03:58] and it turns out that a lot of the ideas of deep learning happen around for
[00:04:01] of deep learning happen around for several decades right the basic ideas of
[00:04:03] several decades right the basic ideas of deep learning happen around for decades
[00:04:04] deep learning happen around for decades so why is deep learning suddenly taking
[00:04:08] so why is deep learning suddenly taking off now that why is it quote coming out
[00:04:09] off now that why is it quote coming out of nowhere on whatever whatever people
[00:04:11] of nowhere on whatever whatever people say I think that the main reason that
[00:04:14] say I think that the main reason that deep learning has been taking off and
[00:04:17] deep learning has been taking off and why you know suddenly all of you
[00:04:19] why you know suddenly all of you hopefully will be that do really
[00:04:20] hopefully will be that do really powerful things with it much more
[00:04:23] powerful things with it much more effectively than two or three years ago
[00:04:24] effectively than two or three years ago is the following um for a lot of over
[00:04:29] is the following um for a lot of over the last couple decades with the
[00:04:31] the last couple decades with the digitization of society we've just
[00:04:33] digitization of society we've just collected more and more data so for
[00:04:36] collected more and more data so for example all of us spend a lot more time
[00:04:37] example all of us spend a lot more time on our computers and smart phones now
[00:04:40] on our computers and smart phones now and whenever you do things on the phone
[00:04:42] and whenever you do things on the phone you know that creates data right and and
[00:04:46] you know that creates data right and and and what used to be represented through
[00:04:50] and what used to be represented through pieces of paper is now much more likely
[00:04:53] pieces of paper is now much more likely a digital record as well so you're if
[00:04:55] a digital record as well so you're if you go take an x-ray as at least in the
[00:04:57] you go take an x-ray as at least in the United States less than some other kind
[00:04:59] United States less than some other kind in developing colonies beliefs in the
[00:05:01] in developing colonies beliefs in the United States there's much higher chance
[00:05:02] United States there's much higher chance now than your x-ray in the hospital is a
[00:05:04] now than your x-ray in the hospital is a digital image rather than a physical
[00:05:06] digital image rather than a physical piece of film or if you order a new
[00:05:09] piece of film or if you order a new marker right there's a much higher
[00:05:11] marker right there's a much higher chance that the fact that you order the
[00:05:13] chance that the fact that you order the marker you know off a website it's now
[00:05:15] marker you know off a website it's now represented as a digital record compared
[00:05:18] represented as a digital record compared to ten years ago when the state of the
[00:05:21] to ten years ago when the state of the global supply chain actually if you
[00:05:22] global supply chain actually if you order if you order ten thousand markers
[00:05:25] order if you order ten thousand markers there's a much higher chance you know
[00:05:27] there's a much higher chance you know ten years ago that the fact that you
[00:05:28] ten years ago that the fact that you place that order was stored on a piece
[00:05:31] place that order was stored on a piece of paper that someone scribbled saying a
[00:05:33] of paper that someone scribbled saying a ship ten thousand markers to Stanford
[00:05:35] ship ten thousand markers to Stanford but now that's much more likely to be a
[00:05:37] but now that's much more likely to be a digital record and so the fact that so
[00:05:40] digital record and so the fact that so many pieces of paper and our digital has
[00:05:42] many pieces of paper and our digital has created data and for a lot of
[00:05:45] created data and for a lot of application areas the amount of data has
[00:05:52] application areas the amount of data has sort of you know exploded over the loss
[00:05:56] sort of you know exploded over the loss twenty years but what we found was that
[00:05:59] twenty years but what we found was that if you look at more traditional learning
[00:06:04] if you look at more traditional learning algorithms traditional machine learning
[00:06:10] algorithms traditional machine learning algorithms the performance and most of
[00:06:12] algorithms the performance and most of them would Plateau
[00:06:13] them would Plateau even as you feed it more and more days
[00:06:16] even as you feed it more and more days so by traditional learning algorithms I
[00:06:18] so by traditional learning algorithms I mean logistic regression support vector
[00:06:20] mean logistic regression support vector machines you know maybe decision trees
[00:06:22] machines you know maybe decision trees develop initial details and it was as if
[00:06:25] develop initial details and it was as if out all the learning algorithms didn't
[00:06:27] out all the learning algorithms didn't know what to do of all the data you can
[00:06:28] know what to do of all the data you can now feed it but what we start to define
[00:06:31] now feed it but what we start to define several years ago was they between a
[00:06:33] several years ago was they between a small neural network right it's
[00:06:38] small neural network right it's performance may look like that if we
[00:06:40] performance may look like that if we train a medium neural net the ones may
[00:06:43] train a medium neural net the ones may look like that and if you train a very
[00:06:46] look like that and if you train a very large neural net you know the
[00:06:48] large neural net you know the performance kind of keeps on getting
[00:06:49] performance kind of keeps on getting better and better up to some usually up
[00:06:52] better and better up to some usually up to some theoretical limit called Bayes
[00:06:53] to some theoretical limit called Bayes error rate which is learned about later
[00:06:55] error rate which is learned about later this coarser a bit but performance can
[00:06:57] this coarser a bit but performance can never exceed 100% but sometimes
[00:06:59] never exceed 100% but sometimes sometimes there's some seething in the
[00:07:01] sometimes there's some seething in the performance but also we've been able to
[00:07:03] performance but also we've been able to measure on many many problems with not
[00:07:06] measure on many many problems with not yet I think that across machine learning
[00:07:08] yet I think that across machine learning and deep learning broadly I think we've
[00:07:10] and deep learning broadly I think we've not yet hit the limits of scale and by
[00:07:13] not yet hit the limits of scale and by scale I mean the amount of data you can
[00:07:15] scale I mean the amount of data you can throw the problem that's still useful
[00:07:17] throw the problem that's still useful for the problem as well as the size of
[00:07:20] for the problem as well as the size of the neural networks and I think you know
[00:07:23] the neural networks and I think you know GPU computing was a large part of how we
[00:07:28] GPU computing was a large part of how we were able to go from training small two
[00:07:30] were able to go from training small two mediums and now training very large
[00:07:32] mediums and now training very large neural networks and once upon a time I
[00:07:35] neural networks and once upon a time I think you know the first actually I
[00:07:37] think you know the first actually I think a lot of the early work on
[00:07:38] think a lot of the early work on training neural networks on GPUs done
[00:07:41] training neural networks on GPUs done here at Stanford right who they're using
[00:07:45] here at Stanford right who they're using crew that the training neural networks
[00:07:46] crew that the training neural networks but what used to be you know one thing
[00:07:49] but what used to be you know one thing one lessons we learn over and over in
[00:07:51] one lessons we learn over and over in computing is that what yesterday's
[00:07:53] computing is that what yesterday's supercomputer is today's you know
[00:07:56] supercomputer is today's you know processor on your on your SmartWatch
[00:07:58] processor on your on your SmartWatch right and so what used to be an amount
[00:08:00] right and so what used to be an amount of computation that was accessible only
[00:08:02] of computation that was accessible only to you know large research labs in
[00:08:04] to you know large research labs in Stanford they could spend a hundred
[00:08:06] Stanford they could spend a hundred thousand dollars on GPUs today you could
[00:08:08] thousand dollars on GPUs today you could that honor on a cloud relatively
[00:08:10] that honor on a cloud relatively inexpensively and so the availability of
[00:08:13] inexpensively and so the availability of relatively large neural network training
[00:08:15] relatively large neural network training capabilities as allow really students
[00:08:18] capabilities as allow really students really know almost everyone many people
[00:08:20] really know almost everyone many people not not many many people to have enough
[00:08:23] not not many many people to have enough access computational power to train what
[00:08:27] access computational power to train what are large enough nearing that where else
[00:08:29] are large enough nearing that where else to drive very high levels of accuracy
[00:08:31] to drive very high levels of accuracy for a lot of applications right and it
[00:08:35] for a lot of applications right and it turns out that if you look broadly
[00:08:39] turns out that if you look broadly across AI you know I think the mass
[00:08:42] across AI you know I think the mass media right newspapers reporters use the
[00:08:45] media right newspapers reporters use the term AI I think within within academia
[00:08:49] term AI I think within within academia or within the industry you tend to say
[00:08:50] or within the industry you tend to say machine learning and deep learning but
[00:08:53] machine learning and deep learning but if you look broadly across AI it turns
[00:08:56] if you look broadly across AI it turns out that AI has many many tools that's
[00:08:59] out that AI has many many tools that's beyond machine learning does even beyond
[00:09:01] beyond machine learning does even beyond deep learning and if any of you take you
[00:09:03] deep learning and if any of you take you know CS 221 write stamp is a high class
[00:09:05] know CS 221 write stamp is a high class great class you learn about a lot of
[00:09:08] great class you learn about a lot of these other tools of AI but the reason
[00:09:11] these other tools of AI but the reason that deep learning is so valuable today
[00:09:14] that deep learning is so valuable today is that if you look across many of the
[00:09:16] is that if you look across many of the tools of AI and that's saying you know
[00:09:19] tools of AI and that's saying you know there's a deep learning slash machine
[00:09:23] there's a deep learning slash machine learning oh and and and and again some
[00:09:26] learning oh and and and and again some you know new networks and deep learning
[00:09:28] you know new networks and deep learning mean almost exactly the same thing right
[00:09:30] mean almost exactly the same thing right it's just that as you know as we start
[00:09:33] it's just that as you know as we start to see deep learning rise of the last
[00:09:35] to see deep learning rise of the last several years we found that deep
[00:09:37] several years we found that deep learning was just a much more attractive
[00:09:40] learning was just a much more attractive brand and so you know and so so that's
[00:09:44] brand and so you know and so so that's the brand that took off but if you look
[00:09:48] the brand that took off but if you look at if you even take an AI class it was a
[00:09:50] at if you even take an AI class it was a broadly across the portfolio of tools
[00:09:52] broadly across the portfolio of tools you have an AI I think that you know
[00:09:55] you have an AI I think that you know I'll often use deep learning machine
[00:09:57] I'll often use deep learning machine learning how sometimes also use a
[00:09:59] learning how sometimes also use a probabilistic graphical model right we
[00:10:02] probabilistic graphical model right we should learn the button cs2 Tony also
[00:10:04] should learn the button cs2 Tony also great cause sometimes I use the planning
[00:10:07] great cause sometimes I use the planning algorithm you know when I'm working on a
[00:10:08] algorithm you know when I'm working on a self-driving car right you need a motion
[00:10:10] self-driving car right you need a motion planning algorithm you need various
[00:10:11] planning algorithm you need various planning our rhythms sometimes I use a
[00:10:14] planning our rhythms sometimes I use a search algorithm sometimes I use
[00:10:16] search algorithm sometimes I use knowledge representation it's very sick
[00:10:18] knowledge representation it's very sick this is one of the technologies
[00:10:19] this is one of the technologies especially knowledge drafts
[00:10:21] especially knowledge drafts is one of the technologies that is
[00:10:23] is one of the technologies that is widely used in industry but I think
[00:10:25] widely used in industry but I think often underappreciated in academia if
[00:10:29] often underappreciated in academia if you do a web search and a web search
[00:10:31] you do a web search and a web search engine pulls up a hotel and the list of
[00:10:32] engine pulls up a hotel and the list of room prices and what this Wi-Fi was a
[00:10:34] room prices and what this Wi-Fi was a swimming pool that's actually a
[00:10:36] swimming pool that's actually a knowledge graph or knowledge
[00:10:37] knowledge graph or knowledge representation knowledge graph but so
[00:10:39] representation knowledge graph but so it's actually used by many companies
[00:10:40] it's actually used by many companies this large databases but this is that he
[00:10:43] this large databases but this is that he may be under appreciated in academia or
[00:10:46] may be under appreciated in academia or sometimes even game theory so if you
[00:10:50] sometimes even game theory so if you learn about AI there is a very large
[00:10:51] learn about AI there is a very large portfolio of many different tools you
[00:10:53] portfolio of many different tools you will see but what has happened over the
[00:10:56] will see but what has happened over the last several years is if you go to a
[00:11:01] last several years is if you go to a conference on probabilistic graphical
[00:11:03] conference on probabilistic graphical models right if this is time and this is
[00:11:05] models right if this is time and this is a performance e you see that you know
[00:11:11] a performance e you see that you know every year probably see graphical models
[00:11:15] every year probably see graphical models work a little bit better than the year
[00:11:16] work a little bit better than the year before
[00:11:16] before if you go to the u AI conference
[00:11:19] if you go to the u AI conference uncertainty an AI conference maybe the
[00:11:21] uncertainty an AI conference maybe the one of the leading conferences maybe the
[00:11:23] one of the leading conferences maybe the leading one not shown on P GM's you see
[00:11:25] leading one not shown on P GM's you see there every year you know researchers
[00:11:27] there every year you know researchers published papers that better than the
[00:11:28] published papers that better than the year before in the state it's the the
[00:11:31] year before in the state it's the the field is steadily marching always I'm
[00:11:33] field is steadily marching always I'm saying for planning if you go to Tripoli
[00:11:35] saying for planning if you go to Tripoli on something you see you know a feud is
[00:11:36] on something you see you know a feud is advancing social rooms are getting
[00:11:38] advancing social rooms are getting better
[00:11:39] better another obsession Alvarez game better
[00:11:41] another obsession Alvarez game better getting Theory albums again better and
[00:11:43] getting Theory albums again better and so the the field of AI marches forward
[00:11:45] so the the field of AI marches forward across all of these different
[00:11:47] across all of these different disciplines but the one that has taken
[00:11:50] disciplines but the one that has taken off you know incredibly quickly is deep
[00:11:54] off you know incredibly quickly is deep learning machine learning and I think a
[00:11:57] learning machine learning and I think a lot of this progress was initially
[00:12:00] lot of this progress was initially driven by scale scale of data and scale
[00:12:04] driven by scale scale of data and scale of computation and the fact that we can
[00:12:05] of computation and the fact that we can now get tons of data during the surge on
[00:12:08] now get tons of data during the surge on your network and get good performance
[00:12:09] your network and get good performance but more recently has been also driven
[00:12:12] but more recently has been also driven by the positive feedback loop of seeing
[00:12:17] by the positive feedback loop of seeing early traction and deep learning thus
[00:12:19] early traction and deep learning thus causing a lot more people to do research
[00:12:21] causing a lot more people to do research and deep learning algorithms and so
[00:12:23] and deep learning algorithms and so there's been tons of algorithmic
[00:12:24] there's been tons of algorithmic innovation in deep learning in the last
[00:12:27] innovation in deep learning in the last several years and you hear a lot about
[00:12:29] several years and you hear a lot about algorithms that were you know relatively
[00:12:31] algorithms that were you know relatively recently invented
[00:12:32] recently invented to sauce as well right and so really I
[00:12:36] to sauce as well right and so really I think that initially the twin forces of
[00:12:38] think that initially the twin forces of a scale of data scale computation but
[00:12:40] a scale of data scale computation but now the triple forces have also a lot of
[00:12:42] now the triple forces have also a lot of algorithmic innovation and massive
[00:12:43] algorithmic innovation and massive investment is continuing to make deep
[00:12:46] investment is continuing to make deep learning make tremendous progress and so
[00:12:50] learning make tremendous progress and so in CST 30 we kind of have you know I I
[00:12:57] in CST 30 we kind of have you know I I think two main goals the first is to
[00:13:00] think two main goals the first is to have you become expert in the deep
[00:13:04] have you become expert in the deep learning algorithms have you have you
[00:13:06] learning algorithms have you have you learned the city arts have you have you
[00:13:08] learned the city arts have you have you have you have deep technical knowledge
[00:13:10] have you have deep technical knowledge on the save outs and deep learning and
[00:13:13] on the save outs and deep learning and second is to give you the know how to
[00:13:16] second is to give you the know how to apply these algorithms to whatever
[00:13:18] apply these algorithms to whatever problems you want to work on so one of
[00:13:22] problems you want to work on so one of the things I've learned so I think you
[00:13:24] the things I've learned so I think you know actually some some of you guys know
[00:13:25] know actually some some of you guys know my history right so you know birthed at
[00:13:27] my history right so you know birthed at Stanford for a long time then um started
[00:13:30] Stanford for a long time then um started as leading the Google brain team which
[00:13:32] as leading the Google brain team which did law projects at Google and I think
[00:13:34] did law projects at Google and I think the Google brain teams you know built
[00:13:35] the Google brain teams you know built from scratch was arguably the leading
[00:13:38] from scratch was arguably the leading force for helping Google go from what
[00:13:41] force for helping Google go from what was already a great internet company
[00:13:43] was already a great internet company into today a great AI company and then
[00:13:46] into today a great AI company and then there's something some that I do in
[00:13:48] there's something some that I do in China
[00:13:49] China oh it's Chinese hey cause in China which
[00:13:51] oh it's Chinese hey cause in China which kind of helped I do go from also what
[00:13:55] kind of helped I do go from also what was already a great company into today
[00:13:57] was already a great company into today you know many people say China's
[00:13:59] you know many people say China's greatest AI company and I think through
[00:14:02] greatest AI company and I think through work on many projects at Google many
[00:14:03] work on many projects at Google many friends if I do and now leading landing
[00:14:05] friends if I do and now leading landing AI are helping many companies on many
[00:14:07] AI are helping many companies on many projects and running around to different
[00:14:09] projects and running around to different companies and see many different machine
[00:14:11] companies and see many different machine learning projects they have I think I've
[00:14:12] learning projects they have I think I've been fortunate to learn a lot of lessons
[00:14:15] been fortunate to learn a lot of lessons not just about the technical aspects of
[00:14:18] not just about the technical aspects of machine learning but about the practical
[00:14:20] machine learning but about the practical know-how has fangs the Machine your name
[00:14:22] know-how has fangs the Machine your name and if you and and I think that what you
[00:14:28] and if you and and I think that what you can learn from you know the internet or
[00:14:32] can learn from you know the internet or from purely academic sources or from
[00:14:34] from purely academic sources or from reading research papers is a lot of the
[00:14:36] reading research papers is a lot of the technical aspects of machine learning
[00:14:38] technical aspects of machine learning and deep learning
[00:14:40] and deep learning but there are a lot of other practical
[00:14:42] but there are a lot of other practical aspects of how to get these algorithms
[00:14:43] aspects of how to get these algorithms to work that I actually do not know of
[00:14:46] to work that I actually do not know of any other academic course that that kind
[00:14:49] any other academic course that that kind of goes into great deaf teaching rates
[00:14:51] of goes into great deaf teaching rates there might be one but I'm I'm not sure
[00:14:53] there might be one but I'm I'm not sure but one of the things that we hope to do
[00:14:58] but one of the things that we hope to do in this class is to not just give you
[00:15:00] in this class is to not just give you the tools but also giving know-how on
[00:15:02] the tools but also giving know-how on how to make it work right and I think
[00:15:04] how to make it work right and I think you know I should spend a lot of time
[00:15:05] you know I should spend a lot of time thinking about
[00:15:06] thinking about so actually late last night I actually
[00:15:09] so actually late last night I actually stayed that very late last night meeting
[00:15:10] stayed that very late last night meeting this new book by um Jon osterhaus on a
[00:15:13] this new book by um Jon osterhaus on a software architecture right and I think
[00:15:16] software architecture right and I think that there's a huge difference between
[00:15:18] that there's a huge difference between you know a junior software engineer and
[00:15:20] you know a junior software engineer and a senior software engineer maybe
[00:15:22] a senior software engineer maybe everyone understands the c-plus paused
[00:15:24] everyone understands the c-plus paused in the Python in the Java syntax yeah
[00:15:26] in the Python in the Java syntax yeah you can get that from promote from you
[00:15:28] you can get that from promote from you just figure out hey this is how c-plus
[00:15:30] just figure out hey this is how c-plus this works inside job where else is how
[00:15:32] this works inside job where else is how Python numpy works but it's often the
[00:15:35] Python numpy works but it's often the high level judgment decisions of how the
[00:15:38] high level judgment decisions of how the architecture system what abstractions do
[00:15:41] architecture system what abstractions do you use how do you define interfaces
[00:15:43] you use how do you define interfaces that defines the difference between a
[00:15:45] that defines the difference between a really good software engineer versus you
[00:15:47] really good software engineer versus you know a less experienced software
[00:15:48] know a less experienced software engineer it's not understanding c-plus
[00:15:50] engineer it's not understanding c-plus or syntax and I think in the same way
[00:15:53] or syntax and I think in the same way today there are lots of ways for you to
[00:15:56] today there are lots of ways for you to learn the technical tools of machine
[00:15:59] learn the technical tools of machine learning and deep learning and you will
[00:16:00] learning and deep learning and you will learn that in this class you know you
[00:16:02] learn that in this class you know you learn how to train a neural network you
[00:16:04] learn how to train a neural network you learn the latest optimization algorithms
[00:16:06] learn the latest optimization algorithms you understand deeply what the content
[00:16:08] you understand deeply what the content is whether recurrent neural network
[00:16:10] is whether recurrent neural network whereas when lsdm is you you understand
[00:16:12] whereas when lsdm is you you understand what intention mod allows you you learn
[00:16:14] what intention mod allows you you learn all of these things in great detail your
[00:16:16] all of these things in great detail your work impression could be vision
[00:16:17] work impression could be vision nationally entrusting speech and so on
[00:16:19] nationally entrusting speech and so on but I think one other thing that is
[00:16:21] but I think one other thing that is relatively unique to this class and to
[00:16:26] relatively unique to this class and to that I guess the the things you see on
[00:16:29] that I guess the the things you see on the defender AI course are websites as
[00:16:31] the defender AI course are websites as what's the things with doing cause is
[00:16:33] what's the things with doing cause is trying to give you the practical
[00:16:35] trying to give you the practical know-how so that when you're building a
[00:16:37] know-how so that when you're building a machine learning system you can be very
[00:16:38] machine learning system you can be very efficient in deciding things like should
[00:16:42] efficient in deciding things like should you collect more data or not right and
[00:16:44] you collect more data or not right and the answer is not always yes I think I
[00:16:46] the answer is not always yes I think I think um with
[00:16:48] think um with I think that many of us try to convey
[00:16:51] I think that many of us try to convey the message that having more data is
[00:16:54] the message that having more data is good right and that's actually more data
[00:16:56] good right and that's actually more data pretty much never hurts but I think the
[00:16:58] pretty much never hurts but I think the message of big data has also been
[00:17:00] message of big data has also been overhyped and sometimes it's actually
[00:17:02] overhyped and sometimes it's actually not worth your while to couldn't collect
[00:17:03] not worth your while to couldn't collect more data right but so when you're
[00:17:06] more data right but so when you're working a machine learning project and
[00:17:08] working a machine learning project and if you are either doing it by yourself
[00:17:10] if you are either doing it by yourself or leading a team your ability to make a
[00:17:12] or leading a team your ability to make a good judgment decision about should just
[00:17:14] good judgment decision about should just spend another week collecting more data
[00:17:16] spend another week collecting more data or should you spend another week
[00:17:18] or should you spend another week searching for Hyper parameters or tuning
[00:17:20] searching for Hyper parameters or tuning parameters your own network that's the
[00:17:22] parameters your own network that's the type of decision that if you make it
[00:17:24] type of decision that if you make it correctly can easily make your team 2x
[00:17:27] correctly can easily make your team 2x or 3x or maybe 10x more efficient and so
[00:17:30] or 3x or maybe 10x more efficient and so one thing we hope to do in this class is
[00:17:32] one thing we hope to do in this class is more systematically imparts to you this
[00:17:35] more systematically imparts to you this this type of knowledge right and so I
[00:17:39] this type of knowledge right and so I think even today I you know actually I
[00:17:44] think even today I you know actually I actually visited lots of machine
[00:17:45] actually visited lots of machine learning teams around Silicon Valley
[00:17:47] learning teams around Silicon Valley around we're on the cusp you what
[00:17:48] around we're on the cusp you what they're doing and you know recently I
[00:17:51] they're doing and you know recently I visited a company that had a team of 30
[00:17:55] visited a company that had a team of 30 people trying to build a learning
[00:17:57] people trying to build a learning algorithm and the team about 30 people
[00:17:59] algorithm and the team about 30 people was working on learning out run for
[00:18:01] was working on learning out run for about three months right and and they
[00:18:03] about three months right and and they had not yet managed to get it to work so
[00:18:05] had not yet managed to get it to work so they're basically you know like you know
[00:18:06] they're basically you know like you know not succeeded after three months one of
[00:18:10] not succeeded after three months one of my colleagues took the data set oh yeah
[00:18:14] my colleagues took the data set oh yeah can your broadcasting don't say anything
[00:18:17] can your broadcasting don't say anything bad alright so one of my colleagues took
[00:18:28] bad alright so one of my colleagues took the data set home and spend one weekend
[00:18:30] the data set home and spend one weekend working on what's he doing now and and
[00:18:40] working on what's he doing now and and and one of my colleagues working on this
[00:18:43] and one of my colleagues working on this problem in one long weekend here at Sun
[00:18:45] problem in one long weekend here at Sun over long weekend for three days was
[00:18:47] over long weekend for three days was able to build a machine learning system
[00:18:48] able to build a machine learning system that outperform what this group of 30
[00:18:50] that outperform what this group of 30 people have been able to do after about
[00:18:52] people have been able to do after about three months so was that desica oh no
[00:18:55] three months so was that desica oh no that's more than a tengas difference and
[00:18:58] that's more than a tengas difference and right and and a lot of the differences
[00:19:00] right and and a lot of the differences between the great machine learning teams
[00:19:01] between the great machine learning teams versus less experienced ones is actually
[00:19:03] versus less experienced ones is actually not just do you know how to you know
[00:19:06] not just do you know how to you know implement it's not just you know how to
[00:19:10] implement it's not just you know how to implement and LST em right in intensive
[00:19:13] implement and LST em right in intensive though or carrots or whatever you have
[00:19:15] though or carrots or whatever you have to know that but there's actually other
[00:19:17] to know that but there's actually other things as well and I think Ken and I and
[00:19:21] things as well and I think Ken and I and the teaching team are looking forward
[00:19:22] the teaching team are looking forward trying to systematically impart to you a
[00:19:25] trying to systematically impart to you a lot of this know-how so that when
[00:19:27] lot of this know-how so that when hopefully someday what you're leading a
[00:19:29] hopefully someday what you're leading a team of machine learning engineers or
[00:19:31] team of machine learning engineers or deep learning engineers that you could
[00:19:32] deep learning engineers that you could help direct the team's efforts more
[00:19:34] help direct the team's efforts more efficiently and oh actually if any we're
[00:19:38] efficiently and oh actually if any we're interested one of the things happen
[00:19:43] interested one of the things happen actually how many of you have heard of
[00:19:45] actually how many of you have heard of machine learning yearning machine
[00:19:48] machine learning yearning machine learning yearning Wow almost none of you
[00:19:49] learning yearning Wow almost none of you okay interesting um so this is a if this
[00:19:52] okay interesting um so this is a if this is your first machine learning class
[00:19:54] is your first machine learning class this may be too advanced for you but if
[00:19:57] this may be too advanced for you but if you've had a little bit of other machine
[00:19:58] you've had a little bit of other machine learning background machine learning
[00:20:00] learning background machine learning yearning is a booklet being right did
[00:20:02] yearning is a booklet being right did I've been I've been working on its
[00:20:04] I've been I've been working on its slowing draw form but if any of you want
[00:20:07] slowing draw form but if any of you want to but machine learning yearning is my
[00:20:10] to but machine learning yearning is my attempt to try to turn gather best
[00:20:14] attempt to try to turn gather best principles the turning machine learning
[00:20:16] principles the turning machine learning from a black art into systematic
[00:20:17] from a black art into systematic engineer discipline and so if you go to
[00:20:20] engineer discipline and so if you go to this website you know this website will
[00:20:23] this website you know this website will send you actually I just finished the
[00:20:25] send you actually I just finished the last just finished the whole draft last
[00:20:28] last just finished the whole draft last weekend and so email allowing students
[00:20:31] weekend and so email allowing students if you want a copy go to the website and
[00:20:33] if you want a copy go to the website and enter your email address now make sure
[00:20:35] enter your email address now make sure that you know when we send out the book
[00:20:37] that you know when we send out the book actually might be later today not sure
[00:20:39] actually might be later today not sure that well then you get a copy of the
[00:20:41] that well then you get a copy of the book drop this wall I tend to write
[00:20:43] book drop this wall I tend to write books and then just post them on the
[00:20:44] books and then just post them on the internet for free so you could but this
[00:20:46] internet for free so you could but this here was just email them them out to
[00:20:48] here was just email them them out to people so you can you can you can get it
[00:20:50] people so you can you can you can get it if you go to the web site and I think
[00:20:54] if you go to the web site and I think this will and I think this calls to
[00:20:56] this will and I think this calls to talking all about lot of principles of
[00:20:57] talking all about lot of principles of machine learning urine II but give you
[00:20:59] machine learning urine II but give you much more practice as well then they're
[00:21:01] much more practice as well then they're just reading a book might um so
[00:21:08] let's see okay so um Jen will give a
[00:21:13] let's see okay so um Jen will give a greater overview of what we'll cover in
[00:21:16] greater overview of what we'll cover in this class but one of the principles
[00:21:19] this class but one of the principles have learned as well is that you know it
[00:21:21] have learned as well is that you know it so I think I'm actually some of you know
[00:21:25] so I think I'm actually some of you know my background right is a co-founder
[00:21:27] my background right is a co-founder Coursera was initially for a long time
[00:21:28] Coursera was initially for a long time so spent a long time really thinking a
[00:21:30] so spent a long time really thinking a lot about education and I think cs2 30
[00:21:34] lot about education and I think cs2 30 represents you know Keon and mine are
[00:21:36] represents you know Keon and mine are teaching teens really best attempt to
[00:21:39] teaching teens really best attempt to deliver a great on-campus deep learning
[00:21:42] deliver a great on-campus deep learning course and so and so the format of this
[00:21:52] course and so and so the format of this class is what's called a flipped
[00:21:55] class is what's called a flipped classroom class and what that means is
[00:21:57] classroom class and what that means is that so you know and I think I've taught
[00:22:00] that so you know and I think I've taught on SCPD for a long time right for many
[00:22:03] on SCPD for a long time right for many many years I guess and I found it even
[00:22:05] many years I guess and I found it even for classes like CST to nine or other
[00:22:08] for classes like CST to nine or other Stanford courses often students end up
[00:22:11] Stanford courses often students end up you know watching videos at home and and
[00:22:14] you know watching videos at home and and I think with the flipped classroom what
[00:22:16] I think with the flipped classroom what we realized was if many students are
[00:22:18] we realized was if many students are watching videos of these lectures at
[00:22:20] watching videos of these lectures at home anyway why don't we spend a lot of
[00:22:24] home anyway why don't we spend a lot of effort to produce higher quality videos
[00:22:26] effort to produce higher quality videos that you can watch then a more time
[00:22:29] that you can watch then a more time efficient for you to watch at home and
[00:22:31] efficient for you to watch at home and so our team created videos DVR I created
[00:22:36] so our team created videos DVR I created you know kind of the best videos we knew
[00:22:38] you know kind of the best videos we knew how to create on deep learning there are
[00:22:40] how to create on deep learning there are now hosted on Coursera and so with I
[00:22:43] now hosted on Coursera and so with I actually think that will be quite time
[00:22:46] actually think that will be quite time efficient for you to watch those videos
[00:22:49] efficient for you to watch those videos do the online program exercises do the
[00:22:52] do the online program exercises do the online quizzes and what that does is it
[00:22:55] online quizzes and what that does is it preserves the class time both the weekly
[00:22:58] preserves the class time both the weekly sessions that we meet right here on
[00:22:59] sessions that we meet right here on Wednesdays as well as the TA discussion
[00:23:01] Wednesdays as well as the TA discussion sections on Fridays for much deeper
[00:23:04] sections on Fridays for much deeper interactions and for much deeper
[00:23:06] interactions and for much deeper discussions and so the format of the
[00:23:09] discussions and so the format of the class is that we ask you to you know do
[00:23:13] class is that we ask you to you know do the online content created by the Avaya
[00:23:15] the online content created by the Avaya host on Coursera
[00:23:17] host on Coursera then in class both the meetings with
[00:23:20] then in class both the meetings with enemy Anakin and I will split these
[00:23:22] enemy Anakin and I will split these sessions roughly 50/50 as was for the
[00:23:25] sessions roughly 50/50 as was for the deeper small group discussion sections
[00:23:26] deeper small group discussion sections you have for the TAS that lets you spend
[00:23:29] you have for the TAS that lets you spend much more time interacting with the TAS
[00:23:31] much more time interacting with the TAS interacting with knme and going deeper
[00:23:34] interacting with knme and going deeper into the material then just then the
[00:23:37] into the material then just then the then the then the then the online
[00:23:39] then the then the then the online content by yourself and that will also
[00:23:43] content by yourself and that will also give us more opportunities to give you
[00:23:45] give us more opportunities to give you advanced material that goes beyond was
[00:23:48] advanced material that goes beyond was hosted online as well as give you
[00:23:52] hosted online as well as give you additional practice with these concepts
[00:23:54] additional practice with these concepts right and so let's see yeah and so um I
[00:24:03] right and so let's see yeah and so um I was a finish up with two more files and
[00:24:06] was a finish up with two more files and now I hand it over to Karen I think you
[00:24:10] now I hand it over to Karen I think you know machine learning deep learning AI
[00:24:13] know machine learning deep learning AI whatever is changing a lot of industries
[00:24:15] whatever is changing a lot of industries right IIIi think you know I think AI is
[00:24:17] right IIIi think you know I think AI is the new electricity much as the rise of
[00:24:21] the new electricity much as the rise of electricity about 100 years ago starting
[00:24:24] electricity about 100 years ago starting in the United States transform every
[00:24:25] in the United States transform every industry really you know the rise of HST
[00:24:29] industry really you know the rise of HST transform agriculture because finally we
[00:24:31] transform agriculture because finally we have refrigeration right that transform
[00:24:33] have refrigeration right that transform agriculture a transform healthcare
[00:24:35] agriculture a transform healthcare imagine going to a hospital today
[00:24:37] imagine going to a hospital today there's no electricity or how do you how
[00:24:38] there's no electricity or how do you how do you even do that right
[00:24:40] do you even do that right computers medical devices have even run
[00:24:42] computers medical devices have even run a healthcare system with transform
[00:24:44] a healthcare system with transform communications through telecom through
[00:24:46] communications through telecom through the Telegraph initiative and now so much
[00:24:48] the Telegraph initiative and now so much the communications really needs
[00:24:49] the communications really needs electricity but electricity transform
[00:24:51] electricity but electricity transform every major industry and I think machine
[00:24:54] every major industry and I think machine learning and deep learning has reached a
[00:24:56] learning and deep learning has reached a level of maturity where we see a
[00:24:57] level of maturity where we see a surprisingly clear path for it to also
[00:25:00] surprisingly clear path for it to also transform pretty much every industry and
[00:25:03] transform pretty much every industry and I hope that through this class after
[00:25:07] I hope that through this class after these next ten weeks that all of you
[00:25:09] these next ten weeks that all of you will be well qualified to go into these
[00:25:12] will be well qualified to go into these different industries and help transform
[00:25:15] different industries and help transform them as well and I think you know after
[00:25:18] them as well and I think you know after this class I hope that you'll be well
[00:25:20] this class I hope that you'll be well qualified to like get a job and some of
[00:25:22] qualified to like get a job and some of the big shiny
[00:25:23] the big shiny tech companies that have large AIT
[00:25:27] tech companies that have large AIT I think a lot of the most exciting work
[00:25:29] I think a lot of the most exciting work to be done today still is to go into the
[00:25:31] to be done today still is to go into the less shiny industries that do not yet
[00:25:34] less shiny industries that do not yet have AI machine learning yet and to take
[00:25:37] have AI machine learning yet and to take it to those areas actually underway in
[00:25:39] it to those areas actually underway in us chatting with a student that works in
[00:25:42] us chatting with a student that works in cosmology who was commenting was that
[00:25:44] cosmology who was commenting was that you know who was it so oh at the back I
[00:25:46] you know who was it so oh at the back I was commenting the cosmology needs more
[00:25:48] was commenting the cosmology needs more machine learning right and and then
[00:25:50] machine learning right and and then maybe he will be the one to take a lot
[00:25:52] maybe he will be the one to take a lot of the ideas or deep learning into
[00:25:53] of the ideas or deep learning into cosmology because I think even outside
[00:25:55] cosmology because I think even outside the shiny tech areas like and then maybe
[00:25:58] the shiny tech areas like and then maybe since I helped play around there AI
[00:26:00] since I helped play around there AI transmission of two large research
[00:26:02] transmission of two large research companies I'm like done transforming
[00:26:04] companies I'm like done transforming internet search companies and I think
[00:26:06] internet search companies and I think that but I think and I think it's great
[00:26:08] that but I think and I think it's great that we have those great AI teams like
[00:26:09] that we have those great AI teams like Google grain I do AI grew other large
[00:26:12] Google grain I do AI grew other large tech companies are great a I teams I
[00:26:14] tech companies are great a I teams I think that's wonderful I think a lot of
[00:26:15] think that's wonderful I think a lot of the important work to be done now how
[00:26:17] the important work to be done now how many of you will do is to take AI to
[00:26:18] many of you will do is to take AI to health care taking out the competition
[00:26:19] health care taking out the competition ology taking out to civil energy intake
[00:26:21] ology taking out to civil energy intake Anatomy country I think all of this is
[00:26:23] Anatomy country I think all of this is worth doing just like electricity didn't
[00:26:26] worth doing just like electricity didn't have one color app it's useful for a lot
[00:26:28] have one color app it's useful for a lot of things and I think many of you will
[00:26:31] of things and I think many of you will go out after this cause and execute many
[00:26:33] go out after this cause and execute many exciting projects both in tech companies
[00:26:35] exciting projects both in tech companies and in you know other areas that that
[00:26:38] and in you know other areas that that like cosmology right or other areas they
[00:26:41] like cosmology right or other areas they were not traditionally considered CAS
[00:26:45] were not traditionally considered CAS areas um so just wrap up with a to lost
[00:26:53] areas um so just wrap up with a to lost thoughts I think that one of the things
[00:26:59] thoughts I think that one of the things that excites me these days is on hoping
[00:27:03] that excites me these days is on hoping you know I always share view one of the
[00:27:05] you know I always share view one of the lessons I learned right watching the
[00:27:08] lessons I learned right watching the rise of AI in multiple companies and
[00:27:10] rise of AI in multiple companies and smell a long time thinking about you
[00:27:12] smell a long time thinking about you know what is it that makes a great AI
[00:27:15] know what is it that makes a great AI company and one of the lessons I learned
[00:27:17] company and one of the lessons I learned was really a hearing Jeff Bezos speak
[00:27:20] was really a hearing Jeff Bezos speak about what is it that makes for an
[00:27:22] about what is it that makes for an Internet company right and I think a lot
[00:27:24] Internet company right and I think a lot of lessons that we learn with the rise
[00:27:27] of lessons that we learn with the rise of the Internet will be useful you know
[00:27:29] of the Internet will be useful you know an internet was maybe one of the last
[00:27:31] an internet was maybe one of the last major technology ways of disruption and
[00:27:33] major technology ways of disruption and just as there's a great time to start
[00:27:35] just as there's a great time to start working on the Internet maybe 20 years
[00:27:37] working on the Internet maybe 20 years ago I think today is a great time
[00:27:39] ago I think today is a great time start working on a iot of learning and
[00:27:41] start working on a iot of learning and so sir wait to turn on the lights on
[00:27:44] so sir wait to turn on the lights on this side as well do I do I control that
[00:27:50] okay thank you okay so so I want to show
[00:27:54] okay thank you okay so so I want to show you one of the lessons either and really
[00:27:56] you one of the lessons either and really spend a lot of time trying to understand
[00:27:57] spend a lot of time trying to understand the rise of the internet because I think
[00:27:59] the rise of the internet because I think we useful to many of you as you navigate
[00:28:01] we useful to many of you as you navigate the rise of machine learning AI in your
[00:28:03] the rise of machine learning AI in your upcoming careers as well which is um one
[00:28:07] upcoming careers as well which is um one of the lessons I learned was it can take
[00:28:09] of the lessons I learned was it can take your favorite shopping mall and build a
[00:28:16] your favorite shopping mall and build a website for the shopping mall that does
[00:28:19] website for the shopping mall that does not turn your shopping mall into an
[00:28:26] not turn your shopping mall into an Internet company right so you know like
[00:28:28] Internet company right so you know like my wife like Stanford Shopping Center
[00:28:31] my wife like Stanford Shopping Center and I and Stanford Shopping has a
[00:28:33] and I and Stanford Shopping has a website but even if you know a great
[00:28:35] website but even if you know a great shopping mall sell stuff on the website
[00:28:37] shopping mall sell stuff on the website there's a huge difference between a
[00:28:39] there's a huge difference between a shopping mall with a website compared to
[00:28:41] shopping mall with a website compared to true internet comfy like an Amazon so
[00:28:45] true internet comfy like an Amazon so what's the difference about five six six
[00:28:48] what's the difference about five six six six seven years ago I was chatting with
[00:28:50] six seven years ago I was chatting with the CEO of a very large American
[00:28:53] the CEO of a very large American retailer and at that time he and the CIO
[00:28:56] retailer and at that time he and the CIO were saying to me they're saying look
[00:28:58] were saying to me they're saying look Andrew we have a website we sell things
[00:29:01] Andrew we have a website we sell things on the website amazon has a website
[00:29:03] on the website amazon has a website Amazon sells things on the website is
[00:29:05] Amazon sells things on the website is the same thing but of course it's not
[00:29:06] the same thing but of course it's not and today this peculiar large American
[00:29:08] and today this peculiar large American retailers you know future existence is
[00:29:10] retailers you know future existence is actually a little bit in question Poggi
[00:29:12] actually a little bit in question Poggi partly because of Amazon so one of the
[00:29:16] partly because of Amazon so one of the lessons I learned really very influenced
[00:29:20] lessons I learned really very influenced by Jeff Bezos is that what defines the
[00:29:22] by Jeff Bezos is that what defines the Internet company is not just whether you
[00:29:24] Internet company is not just whether you have a website instead it is have you
[00:29:27] have a website instead it is have you organized your team or your company to
[00:29:30] organized your team or your company to do the things that the internet lets you
[00:29:32] do the things that the internet lets you do really well for example internet
[00:29:34] do really well for example internet teams engage in pervasive maybe testing
[00:29:38] teams engage in pervasive maybe testing right we know that we could launch two
[00:29:40] right we know that we could launch two versions of a website and just see which
[00:29:43] versions of a website and just see which one works better and so we learn much
[00:29:44] one works better and so we learn much faster where's a traditional shopping
[00:29:46] faster where's a traditional shopping law you can't launch two shopping malls
[00:29:48] law you can't launch two shopping malls in two parallel universes and see which
[00:29:50] in two parallel universes and see which one works better so you just so much
[00:29:52] one works better so you just so much harder to do that we tend to have short
[00:29:55] harder to do that we tend to have short shipping times right you can ship a new
[00:30:02] shipping times right you can ship a new product every day or every week and so
[00:30:04] product every day or every week and so you learn much faster whereas the
[00:30:05] you learn much faster whereas the traditional shopping mall may redesign
[00:30:08] traditional shopping mall may redesign the shopping mall once per once every
[00:30:10] the shopping mall once per once every three months right and we actually
[00:30:12] three months right and we actually organize our teams differently we tend
[00:30:15] organize our teams differently we tend to push decision-making down to the
[00:30:23] to push decision-making down to the engineers or engineers and product
[00:30:24] engineers or engineers and product managers because in the traditional
[00:30:27] managers because in the traditional shopping mall you know things kind of
[00:30:29] shopping mall you know things kind of move slower and maybe the CEO says
[00:30:31] move slower and maybe the CEO says something and then everyone just does
[00:30:33] something and then everyone just does what the CEO says and that's fine but in
[00:30:35] what the CEO says and that's fine but in the Internet era we learned that the
[00:30:39] the Internet era we learned that the technology and the users are so
[00:30:41] technology and the users are so complicated that only the engineers and
[00:30:44] complicated that only the engineers and the product managers for those who don't
[00:30:46] the product managers for those who don't know what that is
[00:30:47] know what that is are close enough to the technology to
[00:30:50] are close enough to the technology to the algorithms and the users to make
[00:30:52] the algorithms and the users to make good decisions and so we tend to push
[00:30:54] good decisions and so we tend to push decision-making power in Internet
[00:30:56] decision-making power in Internet companies down to the engineers so
[00:30:58] companies down to the engineers so engineers and product managers and you
[00:31:00] engineers and product managers and you have to do that in the Internet era
[00:31:02] have to do that in the Internet era because that's how you organize a
[00:31:03] because that's how you organize a company or organize a team to do the
[00:31:06] company or organize a team to do the things the internet lets you do really
[00:31:07] things the internet lets you do really well so I think that was the rise of the
[00:31:10] well so I think that was the rise of the internet um I think we've divided the AI
[00:31:15] internet um I think we've divided the AI era or AI machine learning or deep
[00:31:18] era or AI machine learning or deep learning whether you really call it
[00:31:20] learning whether you really call it we're learning that if you have you know
[00:31:23] we're learning that if you have you know a traditional company plus a few neural
[00:31:27] a traditional company plus a few neural networks that does not by itself
[00:31:35] networks that does not by itself turn the company into AI company and I
[00:31:39] turn the company into AI company and I think what will define the great AI
[00:31:41] think what will define the great AI teams of the future will be do you know
[00:31:46] teams of the future will be do you know how to organize your own work and
[00:31:48] how to organize your own work and organize your team's work to do the
[00:31:51] organize your team's work to do the things that modern you know machine
[00:31:53] things that modern you know machine learning and deep learning and other AI
[00:31:54] learning and deep learning and other AI things let's you do really well and I
[00:31:58] things let's you do really well and I think having many items at Google and
[00:32:00] think having many items at Google and Baidu other buyers I think you know
[00:32:02] Baidu other buyers I think you know Google and Baidu like great and
[00:32:04] Google and Baidu like great and many other countries and thinking the
[00:32:05] many other countries and thinking the through but I think even the best
[00:32:07] through but I think even the best companies in the world haven't
[00:32:08] companies in the world haven't completely figured out what are the
[00:32:10] completely figured out what are the principles by which to organize AI teams
[00:32:12] principles by which to organize AI teams but I think some of them will be that we
[00:32:16] but I think some of them will be that we tend to I think that AI teams tend to be
[00:32:20] tend to I think that AI teams tend to be very good at a strategic data
[00:32:23] very good at a strategic data acquisition and so you see AI companies
[00:32:30] acquisition and so you see AI companies or AI teams even even you know do things
[00:32:33] or AI teams even even you know do things that may not seem like it makes sense
[00:32:35] that may not seem like it makes sense and why do these companies of all these
[00:32:37] and why do these companies of all these three products that don't make any money
[00:32:38] three products that don't make any money well some of it is the required data
[00:32:40] well some of it is the required data that you can monetize through other ways
[00:32:43] that you can monetize through other ways right through advertising or through
[00:32:45] right through advertising or through learning about users and so there are a
[00:32:47] learning about users and so there are a lot of data acquisition strategies that
[00:32:50] lot of data acquisition strategies that at the surface level may not make sense
[00:32:52] at the surface level may not make sense but actually do make sense if you
[00:32:53] but actually do make sense if you understand how this can be married with
[00:32:55] understand how this can be married with deep learning algorithms to create value
[00:32:57] deep learning algorithms to create value elsewhere and I think that uh AI
[00:33:01] elsewhere and I think that uh AI companies tend to organize data
[00:33:04] companies tend to organize data differently right ai teams tend to be
[00:33:11] differently right ai teams tend to be very good at putting our data together I
[00:33:13] very good at putting our data together I think before the rise of deep learning
[00:33:15] think before the rise of deep learning many companies have fragmented data
[00:33:17] many companies have fragmented data warehouses where I have a big company if
[00:33:20] warehouses where I have a big company if you have 50 different databases you know
[00:33:22] you have 50 different databases you know in 50 different divisions it's actually
[00:33:24] in 50 different divisions it's actually very difficult for an engineer to look
[00:33:26] very difficult for an engineer to look at all those dates and put it together
[00:33:27] at all those dates and put it together to train the learning algorithm to do
[00:33:29] to train the learning algorithm to do something valuable so the leading AI
[00:33:31] something valuable so the leading AI companies tend to have unified data
[00:33:34] companies tend to have unified data warehouses and I guess and I know we
[00:33:36] warehouses and I guess and I know we have a large home audience or SCPD or
[00:33:38] have a large home audience or SCPD or other home audience here so if any of
[00:33:40] other home audience here so if any of you work a large tech companies you know
[00:33:42] you work a large tech companies you know this is something that that many
[00:33:44] this is something that that many companies are investing in today to lay
[00:33:46] companies are investing in today to lay the foundation for learning algorithms
[00:33:49] the foundation for learning algorithms we tend to be very good at smarting a
[00:33:51] we tend to be very good at smarting a pervasive automation opportunities and
[00:33:55] pervasive automation opportunities and which is very good at spotting
[00:33:56] which is very good at spotting opportunities where you could instead of
[00:33:59] opportunities where you could instead of having people do a task of a deep
[00:34:00] having people do a task of a deep learning algorithm to at all so I have a
[00:34:02] learning algorithm to at all so I have a different thing I offer into a toss and
[00:34:04] different thing I offer into a toss and we also have me
[00:34:12] we also have me descriptions which I don't have time to
[00:34:13] descriptions which I don't have time to talk about but just as book the rise of
[00:34:15] talk about but just as book the rise of the internet we started creating a lot
[00:34:17] the internet we started creating a lot of new roles for engineers I think
[00:34:20] of new roles for engineers I think actually once upon the time the world
[00:34:21] actually once upon the time the world was simple and there was just a software
[00:34:23] was simple and there was just a software engineering title but as technology
[00:34:26] engineering title but as technology gotten got more complicated we started
[00:34:29] gotten got more complicated we started to specialize so that's why you know
[00:34:31] to specialize so that's why you know with the Internet where front end back
[00:34:32] with the Internet where front end back and mobile right and then we have you
[00:34:36] and mobile right and then we have you know and then with increasingly other
[00:34:39] know and then with increasingly other roles right cue a DevOps IT move into
[00:34:41] roles right cue a DevOps IT move into increased specialization of knowledge
[00:34:43] increased specialization of knowledge and so what the rise of machine learning
[00:34:46] and so what the rise of machine learning we're starting the creation of new roles
[00:34:48] we're starting the creation of new roles like machine learning engineer research
[00:34:50] like machine learning engineer research machine learning research scientists and
[00:34:53] machine learning research scientists and our product managers and AI teams also
[00:34:55] our product managers and AI teams also behave differently than proper managers
[00:34:57] behave differently than proper managers and internet companies and so one of the
[00:35:00] and internet companies and so one of the things we'll revisit a few times
[00:35:02] things we'll revisit a few times throughout this quarter is and I don't
[00:35:04] throughout this quarter is and I don't mean to to corporate I know that many of
[00:35:06] mean to to corporate I know that many of you are you know some of the SUV the
[00:35:08] you are you know some of the SUV the audience or online orders already
[00:35:10] audience or online orders already working company many of you when you
[00:35:11] working company many of you when you graduate from Stanford we end up maybe
[00:35:13] graduate from Stanford we end up maybe starting your own company or joining an
[00:35:15] starting your own company or joining an existing company but I think that
[00:35:18] existing company but I think that solving a lot of these questions of how
[00:35:20] solving a lot of these questions of how to organize your team's effectively in
[00:35:22] to organize your team's effectively in the AI error will help you do more
[00:35:24] the AI error will help you do more valuable work and I think to make one
[00:35:30] valuable work and I think to make one more analogy you know I think that one
[00:35:32] more analogy you know I think that one of the things I hope Ken and I will
[00:35:34] of the things I hope Ken and I will share of you throughout this quarter is
[00:35:37] share of you throughout this quarter is just as in the software engineering
[00:35:39] just as in the software engineering world it took us a long time to figure
[00:35:42] world it took us a long time to figure out what is agile development right or
[00:35:45] out what is agile development right or whether the pros and cons of you know
[00:35:47] whether the pros and cons of you know waterfall model versus agile or how do
[00:35:50] waterfall model versus agile or how do you what does a strum process right oh
[00:35:53] you what does a strum process right oh this is code review a good idea it seems
[00:35:55] this is code review a good idea it seems a good idea to me right it's but this
[00:35:57] a good idea to me right it's but this these practices after after program
[00:36:01] these practices after after program languages were created or invented or
[00:36:03] languages were created or invented or whatever we still had to figure all
[00:36:05] whatever we still had to figure all these ways to help individuals and teams
[00:36:07] these ways to help individuals and teams write software effectively and so if you
[00:36:10] write software effectively and so if you worked in you know high-performing
[00:36:12] worked in you know high-performing corporate industrial AI teams using
[00:36:15] corporate industrial AI teams using these software engineering practices
[00:36:16] these software engineering practices there is an Co review to agile to
[00:36:19] there is an Co review to agile to whatever you know you know that having a
[00:36:21] whatever you know you know that having a team work effectively to write software
[00:36:23] team work effectively to write software is more than
[00:36:24] is more than everyone knowing C++ syntax are
[00:36:26] everyone knowing C++ syntax are wondering Python syntax and I think in
[00:36:29] wondering Python syntax and I think in the machine learning world we're still
[00:36:31] the machine learning world we're still in the process of inventing these types
[00:36:35] in the process of inventing these types of processes
[00:36:35] of processes what is the strum what does the agile
[00:36:37] what is the strum what does the agile development what's the equivalent of
[00:36:38] development what's the equivalent of code review for developing machine
[00:36:40] code review for developing machine learning algorithms and I think probably
[00:36:43] learning algorithms and I think probably this class more than more than this
[00:36:46] this class more than more than this class and machine learning earning more
[00:36:49] class and machine learning earning more than any other easels I'm aware of right
[00:36:51] than any other easels I'm aware of right now I think we'll try to systematically
[00:36:53] now I think we'll try to systematically teach you these tools so that you don't
[00:36:56] teach you these tools so that you don't just are able to derive a learning
[00:36:58] just are able to derive a learning algorithm and implement a learning
[00:37:00] algorithm and implement a learning algorithm but that you're actually you
[00:37:02] algorithm but that you're actually you know very effective in terms of how you
[00:37:05] know very effective in terms of how you go about building these systems so last
[00:37:10] go about building these systems so last thing before I pass it can is on the
[00:37:16] thing before I pass it can is on the other question that I've been asked I
[00:37:18] other question that I've been asked I guess several times this week now that
[00:37:21] guess several times this week now that just pre-emptive the answer is a so
[00:37:25] just pre-emptive the answer is a so there are multiple machine learning
[00:37:28] there are multiple machine learning classes going on at Stanford this
[00:37:29] classes going on at Stanford this quarter so the other frequently asked
[00:37:31] quarter so the other frequently asked question is which of these classes
[00:37:33] question is which of these classes should you take so let me just address
[00:37:35] should you take so let me just address that preemptively before someone else
[00:37:37] that preemptively before someone else maybe because I've been asked twice
[00:37:38] maybe because I've been asked twice already and the other two classes is
[00:37:40] already and the other two classes is quarter so I think actually what what's
[00:37:43] quarter so I think actually what what's happened over the last several years the
[00:37:45] happened over the last several years the standard is the demand for machine
[00:37:47] standard is the demand for machine learning education has you know been
[00:37:49] learning education has you know been rising dramatically because I mean years
[00:37:53] rising dramatically because I mean years the majority of CS PhD applicants in
[00:37:56] the majority of CS PhD applicants in Stanford you know are applying to do
[00:37:58] Stanford you know are applying to do work and machine learning or applying to
[00:38:00] work and machine learning or applying to do work in AI and I think all of you can
[00:38:02] do work in AI and I think all of you can kind of see that there's such a shortage
[00:38:05] kind of see that there's such a shortage of machine learning engineers right and
[00:38:07] of machine learning engineers right and then there's a little bit of and and and
[00:38:09] then there's a little bit of and and and I think that shortage should continue
[00:38:10] I think that shortage should continue for a long time so I think many people
[00:38:11] for a long time so I think many people see that if you can explain the machine
[00:38:13] see that if you can explain the machine learning there'll be great opportunities
[00:38:14] learning there'll be great opportunities for you to do meaningful work on campus
[00:38:17] for you to do meaningful work on campus to take machine learning to compile or
[00:38:19] to take machine learning to compile or cosmology on the can train or do great
[00:38:21] cosmology on the can train or do great research on campus as well as Brad from
[00:38:23] research on campus as well as Brad from Stanford and do very unique work when I
[00:38:27] Stanford and do very unique work when I wonder around Silicon Valley I feel like
[00:38:29] wonder around Silicon Valley I feel like there are so many ideas for great
[00:38:30] there are so many ideas for great machine learning projects that exactly
[00:38:33] machine learning projects that exactly zero people see it through working on
[00:38:34] zero people see it through working on because
[00:38:35] because just aren't enough machine learning
[00:38:36] just aren't enough machine learning people in the world right now so by
[00:38:38] people in the world right now so by learning these skills you could you have
[00:38:40] learning these skills you could you have many opportunities to be the first one
[00:38:42] many opportunities to be the first one to do something very exciting and
[00:38:44] to do something very exciting and meaningful right alright and and um you
[00:38:47] meaningful right alright and and um you probably read in the newspapers about
[00:38:48] probably read in the newspapers about how much money machine there any people
[00:38:50] how much money machine there any people make I'm actually much less more I
[00:38:51] make I'm actually much less more I actually find that I hope a lot you make
[00:38:53] actually find that I hope a lot you make a lot of money but I actually personally
[00:38:54] a lot of money but I actually personally don't find out that you know as exciting
[00:38:57] don't find out that you know as exciting I think that every time there's a major
[00:38:59] I think that every time there's a major technological disruption it gives us an
[00:39:02] technological disruption it gives us an opportunity to remove large parts of the
[00:39:04] opportunity to remove large parts of the world and I hope that as some of you go
[00:39:06] world and I hope that as some of you go improve a health care system improving
[00:39:08] improve a health care system improving educational system maybe you know see if
[00:39:10] educational system maybe you know see if we can help preserve the smooth
[00:39:11] we can help preserve the smooth functioning of democracy around the
[00:39:13] functioning of democracy around the world I think that it really your unique
[00:39:15] world I think that it really your unique skills and deep learning will give you
[00:39:16] skills and deep learning will give you opportunities to do that I think
[00:39:18] opportunities to do that I think hopefully very meaningful work um but
[00:39:22] hopefully very meaningful work um but because of this massive massive rising
[00:39:25] because of this massive massive rising demand for machine learning education
[00:39:27] demand for machine learning education there are some for long time CS 239
[00:39:30] there are some for long time CS 239 machine learning was the core machine
[00:39:32] machine learning was the core machine learning class at Stanford's and then CS
[00:39:36] learning class at Stanford's and then CS 230 is actually the newest new creation
[00:39:40] 230 is actually the newest new creation I think and the other costs that were
[00:39:43] I think and the other costs that were involved in that units and I are
[00:39:45] involved in that units and I are involved in this quarter is CS 229 a so
[00:39:50] involved in this quarter is CS 229 a so so if China decide which of these
[00:39:52] so if China decide which of these classes to take I think I think that
[00:39:55] classes to take I think I think that these classes a little bit like Pokemon
[00:39:57] these classes a little bit like Pokemon right you really should collect them all
[00:40:00] right you really should collect them all but better but I think we've been trying
[00:40:03] but better but I think we've been trying to design these classes to actually
[00:40:05] to design these classes to actually teach different things and not have too
[00:40:07] teach different things and not have too much overlap and so there is so I have
[00:40:13] much overlap and so there is so I have seen students take two classes at the
[00:40:15] seen students take two classes at the same time and that's actually fine
[00:40:16] same time and that's actually fine there's not that we have old lab is fine
[00:40:19] there's not that we have old lab is fine that you actually learn different things
[00:40:21] that you actually learn different things if you take any two of these classes at
[00:40:22] if you take any two of these classes at the same time 69 is machine learning is
[00:40:26] the same time 69 is machine learning is the most mathematical of these classes
[00:40:27] the most mathematical of these classes and we go much more since through now it
[00:40:30] and we go much more since through now it goes much more into the mathematical
[00:40:31] goes much more into the mathematical derivations of the algorithms CS 229 a
[00:40:35] derivations of the algorithms CS 229 a is applying machine learning is much
[00:40:37] is applying machine learning is much less mathematical but spends a bit more
[00:40:39] less mathematical but spends a bit more time on the practical aspects is
[00:40:42] time on the practical aspects is actually easier on Brown to machine
[00:40:44] actually easier on Brown to machine learning as well as the least
[00:40:45] learning as well as the least mathematical of
[00:40:46] mathematical of classes CS 230 is somewhere in between
[00:40:48] classes CS 230 is somewhere in between this is Monmouth Alden 69 a less Matt
[00:40:52] this is Monmouth Alden 69 a less Matt Matthews SCS 230 but where CSU 30
[00:40:55] Matthews SCS 230 but where CSU 30 focuses on is a deep learning which is
[00:40:58] focuses on is a deep learning which is just one small subset of machine
[00:41:00] just one small subset of machine learning but it is the hottest subset of
[00:41:02] learning but it is the hottest subset of machine learning whereas there are a lot
[00:41:03] machine learning whereas there are a lot of other machine learning algorithms
[00:41:05] of other machine learning algorithms from your PCA k-means recommender
[00:41:07] from your PCA k-means recommender systems support vector machines that are
[00:41:10] systems support vector machines that are also very useful that I use you know in
[00:41:12] also very useful that I use you know in my work quite frequently that we don't
[00:41:14] my work quite frequently that we don't teach in CS 230 but then stored in C
[00:41:16] teach in CS 230 but then stored in C it's two three nine six four two two
[00:41:18] it's two three nine six four two two nine eight oh we're so the unique things
[00:41:21] nine eight oh we're so the unique things about C's 230 is it focuses on deep
[00:41:23] about C's 230 is it focuses on deep learning so I'll know if you wanted list
[00:41:25] learning so I'll know if you wanted list deep learning on your resume I guess
[00:41:26] deep learning on your resume I guess maybe this is the the easiest way to do
[00:41:28] maybe this is the the easiest way to do it I don't know again it's not what I
[00:41:30] it I don't know again it's not what I tend to optimize for but but and I think
[00:41:32] tend to optimize for but but and I think CS 230 goes to D pers in their practical
[00:41:36] CS 230 goes to D pers in their practical know-how and how to apply these
[00:41:37] know-how and how to apply these algorithms oh and so just and I want to
[00:41:41] algorithms oh and so just and I want to set expectations accurately as well
[00:41:43] set expectations accurately as well right so what I don't want is that you
[00:41:46] right so what I don't want is that you guys did complain in the other quarter
[00:41:47] guys did complain in the other quarter that you know there wasn't enough math
[00:41:49] that you know there wasn't enough math because that's actually not the point
[00:41:51] because that's actually not the point what has happened in the last decade is
[00:41:53] what has happened in the last decade is the amount of math you need to be a
[00:41:55] the amount of math you need to be a great machine learning person has
[00:41:57] great machine learning person has actually decreased I think and I wanted
[00:42:00] actually decreased I think and I wanted to do less math and CS 2:30 but spend
[00:42:03] to do less math and CS 2:30 but spend more time teaching you the practical
[00:42:06] more time teaching you the practical know-how of how to actually apply these
[00:42:07] know-how of how to actually apply these algorithms right so yeah and I think to
[00:42:11] algorithms right so yeah and I think to 2000 a is very the easier this cause
[00:42:13] 2000 a is very the easier this cause that's the most technical this is the
[00:42:15] that's the most technical this is the most most hands-on apply you do a lot of
[00:42:17] most most hands-on apply you do a lot of projects on different different topics
[00:42:19] projects on different different topics right and I think these courses are
[00:42:21] right and I think these courses are often the foundation or some subset that
[00:42:23] often the foundation or some subset that these are often the foundational courses
[00:42:24] these are often the foundational courses as students say because if you say learn
[00:42:28] as students say because if you say learn deep learning some common sequence first
[00:42:30] deep learning some common sequence first student sister you know learn the
[00:42:32] student sister you know learn the foundations of machine learning or
[00:42:34] foundations of machine learning or [Music]
[00:42:37] machine learning of deep learning and so
[00:42:40] machine learning of deep learning and so you have the foundation first before you
[00:42:43] you have the foundation first before you go which then often sets you up to later
[00:42:45] go which then often sets you up to later go deeper into computer vision or
[00:42:48] go deeper into computer vision or national Enders processing or robotics
[00:42:50] national Enders processing or robotics or deep reinforcement learning and so
[00:42:52] or deep reinforcement learning and so common sequencing that common tactic
[00:42:54] common sequencing that common tactic that Stanford says take is to use these
[00:42:57] that Stanford says take is to use these in the foundation you see a bit of every
[00:42:59] in the foundation you see a bit of every thing from divisions national processing
[00:43:01] thing from divisions national processing in speech recognition you know touch low
[00:43:03] in speech recognition you know touch low bill on self-driving cars but that gives
[00:43:05] bill on self-driving cars but that gives you the foundation to then decide you
[00:43:07] you the foundation to then decide you want to go deeper into the National
[00:43:08] want to go deeper into the National image processing or robotics or
[00:43:10] image processing or robotics or enforcement learning or computer vision
[00:43:12] enforcement learning or computer vision or something else this is common
[00:43:13] or something else this is common sequencing of classes that students take
[00:43:16] sequencing of classes that students take ok so um look forward to spending this
[00:43:20] ok so um look forward to spending this quarter with you let me just check out
[00:43:22] quarter with you let me just check out any quick questions and you're not hand
[00:43:24] any quick questions and you're not hand it over takes you into a decision making
[00:43:33] it over takes you into a decision making by engineers and product managers you
[00:43:36] by engineers and product managers you may be pushing decision making I wrote
[00:43:38] may be pushing decision making I wrote this room aching by engineers there but
[00:43:40] this room aching by engineers there but really engineers and probably managers
[00:43:42] really engineers and probably managers Oh a pervasive old sorry a pervasive
[00:43:45] Oh a pervasive old sorry a pervasive automation so to say that like like so
[00:44:00] automation so to say that like like so far you like what what are the most like
[00:44:04] far you like what what are the most like the most meaningful successes of machine
[00:44:06] the most meaningful successes of machine learning that you think have happened
[00:44:10] learning that you think have happened already so all of you are using learning
[00:44:12] already so all of you are using learning algorithms probably dozens of times a
[00:44:14] algorithms probably dozens of times a day maybe even hundreds of times a day
[00:44:15] day maybe even hundreds of times a day without knowing it right every time you
[00:44:17] without knowing it right every time you use a web search engine there's a
[00:44:19] use a web search engine there's a learning algorithm that's improving the
[00:44:21] learning algorithm that's improving the quality of search results there's also
[00:44:22] quality of search results there's also learning however trying to show you the
[00:44:24] learning however trying to show you the most relevant as and this helps those
[00:44:25] most relevant as and this helps those companies actually make it all the money
[00:44:27] companies actually make it all the money every time it turns out that actually
[00:44:30] every time it turns out that actually both Google and Baidu have publicly said
[00:44:33] both Google and Baidu have publicly said that over ten percent of searches on
[00:44:34] that over ten percent of searches on mobile are through voice search and so I
[00:44:37] mobile are through voice search and so I think it's great that you can now talk
[00:44:39] think it's great that you can now talk to your cell phone rather than typed on
[00:44:40] to your cell phone rather than typed on the tiny little keyboard if you wanna do
[00:44:42] the tiny little keyboard if you wanna do a do a web search and mobile if you go
[00:44:45] a do a web search and mobile if you go to you know website like Amazon or
[00:44:48] to you know website like Amazon or Netflix or
[00:44:50] Netflix or there are learning algorithms
[00:44:51] there are learning algorithms recommending more rubber movies on the
[00:44:53] recommending more rubber movies on the more relevant products to you every time
[00:44:55] more relevant products to you every time you use your credit cards there's a
[00:44:57] you use your credit cards there's a learning algorithm kind of probably from
[00:45:00] learning algorithm kind of probably from almost all companies I'm aware of does
[00:45:02] almost all companies I'm aware of does the learning algorithm kind of figure
[00:45:03] the learning algorithm kind of figure out if it's you using a credit card or
[00:45:05] out if it's you using a credit card or if it's been stolen so they should so
[00:45:07] if it's been stolen so they should so they should you know there's a lava see
[00:45:09] they should you know there's a lava see if it's a fraudulent transaction or not
[00:45:10] if it's a fraudulent transaction or not every time you open up your email
[00:45:12] every time you open up your email the only reason email is even usable
[00:45:14] the only reason email is even usable it's because of your spam filter which
[00:45:16] it's because of your spam filter which is because of learning algorithm they're
[00:45:17] is because of learning algorithm they're worse much better now than then before I
[00:45:20] worse much better now than then before I don't know I and so there's a III think
[00:45:25] don't know I and so there's a III think you know one of the amazing things about
[00:45:27] you know one of the amazing things about AI machine learning is I love it when it
[00:45:29] AI machine learning is I love it when it disappears in the background right you
[00:45:31] disappears in the background right you you you use your you know you use these
[00:45:33] you you use your you know you use these algorithms you boot up your map
[00:45:35] algorithms you boot up your map application and it finds the shortest
[00:45:36] application and it finds the shortest route for you to drive from here to
[00:45:38] route for you to drive from here to there and there's a learning algorithm
[00:45:40] there and there's a learning algorithm predicting what traffic will be like on
[00:45:41] predicting what traffic will be like on highway 101 one hour from now but you
[00:45:45] highway 101 one hour from now but you don't even need to think that there was
[00:45:46] don't even need to think that there was a learning algorithm trying to figure
[00:45:47] a learning algorithm trying to figure out what traffic will be like one are in
[00:45:49] out what traffic will be like one are in the future seems pretty magical right
[00:45:51] the future seems pretty magical right that you know but but that you could
[00:45:54] that you know but but that you could just use it this you can but all these
[00:45:56] just use it this you can but all these wonderful products and systems that help
[00:45:58] wonderful products and systems that help people but abstract away a lot of
[00:46:00] people but abstract away a lot of details so that's the present and I
[00:46:02] details so that's the present and I think in the future near future
[00:46:04] think in the future near future most of my PhD students most my research
[00:46:07] most of my PhD students most my research group here is work on machine learning
[00:46:08] group here is work on machine learning for healthcare I think that will have
[00:46:10] for healthcare I think that will have significant inroads you know and my team
[00:46:16] significant inroads you know and my team at landing a I spent a long time with a
[00:46:18] at landing a I spent a long time with a lot of industries for manufacturing the
[00:46:19] lot of industries for manufacturing the agriculture so other things I'm excited
[00:46:22] agriculture so other things I'm excited about machine learning for education
[00:46:24] about machine learning for education keep people precise to tell how people
[00:46:27] keep people precise to tell how people recommended crew size contents this
[00:46:30] recommended crew size contents this fascinating research done here at
[00:46:31] fascinating research done here at Stanford by Chris peach and a few others
[00:46:33] Stanford by Chris peach and a few others on using running errands to give people
[00:46:35] on using running errands to give people feedback on coding homework assignments
[00:46:38] feedback on coding homework assignments I so so sorry there's so many examples
[00:46:41] I so so sorry there's so many examples of machine learning I could talk for
[00:46:42] of machine learning I could talk for quite some time
[00:46:43] quite some time yeah alright one last question hand over
[00:46:46] yeah alright one last question hand over cancer yeah the C so the so the format
[00:46:54] cancer yeah the C so the so the format of the class is that you watch videos
[00:46:58] of the class is that you watch videos created by people and AI
[00:47:01] created by people and AI Oh Sarah so you see me a lot there but
[00:47:03] Oh Sarah so you see me a lot there but in addition Kiran and I will be having
[00:47:07] in addition Kiran and I will be having lectures here in this classroom every
[00:47:09] lectures here in this classroom every Wednesday and that will be you know
[00:47:11] Wednesday and that will be you know completely new material that is not
[00:47:15] completely new material that is not online anywhere at leas right now yeah
[00:47:18] online anywhere at leas right now yeah yeah and then also that I think the the
[00:47:21] yeah and then also that I think the the point the point that the flipped
[00:47:22] point the point that the flipped classroom saying really is some of the
[00:47:24] classroom saying really is some of the things is really more time efficient for
[00:47:25] things is really more time efficient for you to just learn online so there's the
[00:47:27] you to just learn online so there's the online content but what it does is it
[00:47:29] online content but what it does is it leaves this classroom time for us and
[00:47:31] leaves this classroom time for us and not you know deliver the same lecture
[00:47:33] not you know deliver the same lecture year after year but to get other get
[00:47:35] year after year but to get other get spend time to get to know you on they
[00:47:36] spend time to get to know you on they have more time answer your questions and
[00:47:39] have more time answer your questions and also give you more in cost practice on
[00:47:41] also give you more in cost practice on these things right so there's the
[00:47:43] these things right so there's the Coursera the vendor course our content
[00:47:45] Coursera the vendor course our content but what what we do in CS 230 is to
[00:47:47] but what what we do in CS 230 is to augment that to give you much deeper
[00:47:49] augment that to give you much deeper practice more advanced examples some
[00:47:52] practice more advanced examples some more deeper mathematical derivations and
[00:47:54] more deeper mathematical derivations and more practice so you know you deepen
[00:47:56] more practice so you know you deepen your knowledge of that and with that let
[00:47:59] your knowledge of that and with that let me hand it over to the Ken
[00:48:14] yeah I'm gonna get back at him by making
[00:48:16] yeah I'm gonna get back at him by making noise while he's talking okay okay
[00:48:27] noise while he's talking okay okay thanks Andrew hi everyone I'm John we're
[00:48:32] thanks Andrew hi everyone I'm John we're excited to have you here today those of
[00:48:35] excited to have you here today those of you who are in class but also those of
[00:48:36] you who are in class but also those of you who are CPD students we wanted to
[00:48:39] you who are CPD students we wanted to take a little more time to explain a
[00:48:41] take a little more time to explain a little bit about the course logistics
[00:48:43] little bit about the course logistics what this course is about and also what
[00:48:46] what this course is about and also what it is to be a CS 230 students in fall
[00:48:50] it is to be a CS 230 students in fall two thousand eighteen so the course
[00:48:53] two thousand eighteen so the course online is structured into five chapters
[00:48:57] online is structured into five chapters or sub courses let's say what we will
[00:49:00] or sub courses let's say what we will teach you first is what is the neural
[00:49:02] teach you first is what is the neural you need to know that's after
[00:49:04] you need to know that's after understanding what a neuron is you're
[00:49:06] understanding what a neuron is you're going to build layers with these neurons
[00:49:08] going to build layers with these neurons you're then going to stack these layers
[00:49:10] you're then going to stack these layers on top of each other to build a network
[00:49:12] on top of each other to build a network that can be small or deep this is the
[00:49:15] that can be small or deep this is the first course unfortunately it's not
[00:49:18] first course unfortunately it's not enough to deploy a network just just
[00:49:22] enough to deploy a network just just building a neural network is not enough
[00:49:24] building a neural network is not enough to get it to work so in the second
[00:49:26] to get it to work so in the second course we're going to teach you the
[00:49:27] course we're going to teach you the methods that are used to tune this
[00:49:30] methods that are used to tune this network in order to improve their
[00:49:32] network in order to improve their performances this is the second part as
[00:49:36] performances this is the second part as Andrew mentioned one thing we're really
[00:49:40] Andrew mentioned one thing we're really putting a huge emphasis on in CS 230 is
[00:49:44] putting a huge emphasis on in CS 230 is the industrial applications and how the
[00:49:47] the industrial applications and how the industry works in AI so the third course
[00:49:49] industry works in AI so the third course is going to help you understand how to
[00:49:51] is going to help you understand how to strategize your project that you'll do
[00:49:53] strategize your project that you'll do to file the quarter but also in general
[00:49:55] to file the quarter but also in general how do a AI teams work you can have an
[00:49:57] how do a AI teams work you can have an algorithm you have to identify why does
[00:50:00] algorithm you have to identify why does the algorithm work why does it not work
[00:50:02] the algorithm work why does it not work and if it doesn't work what are the
[00:50:04] and if it doesn't work what are the parts that you should improve inside the
[00:50:06] parts that you should improve inside the algorithm the two last courses course
[00:50:09] algorithm the two last courses course fourth of course four and five are
[00:50:10] fourth of course four and five are focusing on two fields that are defined
[00:50:14] focusing on two fields that are defined by two types of algorithms first
[00:50:16] by two types of algorithms first convolutional neural networks that have
[00:50:18] convolutional neural networks that have been proven to work very well on imaging
[00:50:21] been proven to work very well on imaging or videos and on the other hand sequence
[00:50:24] or videos and on the other hand sequence models that include also recurrent
[00:50:26] models that include also recurrent neural networks
[00:50:27] neural networks that applied the lots in natural
[00:50:29] that applied the lots in natural language processing or speech
[00:50:31] language processing or speech recognition so you're going to see all
[00:50:34] recognition so you're going to see all that from the online perspective we use
[00:50:39] that from the online perspective we use a specific notation in CS 2:30 so when I
[00:50:41] a specific notation in CS 2:30 so when I will say C 2 M 3 it refers to course to
[00:50:45] will say C 2 M 3 it refers to course to module 3
[00:50:46] module 3 so the third module of improving deep
[00:50:49] so the third module of improving deep neural networks okay
[00:50:51] neural networks okay and I'd like everyone to go on the
[00:50:53] and I'd like everyone to go on the website see a 230 syllabus after the
[00:50:55] website see a 230 syllabus after the class to look at all the syllabus
[00:50:58] class to look at all the syllabus throughout the quarter check when the
[00:50:59] throughout the quarter check when the midterm is and when the final poster
[00:51:02] midterm is and when the final poster presentation is the schedule is posted
[00:51:05] presentation is the schedule is posted there so check it out and we're going to
[00:51:08] there so check it out and we're going to use the Coursera platform as you know so
[00:51:11] use the Coursera platform as you know so on Coursera you will receive an invite
[00:51:14] on Coursera you will receive an invite on your Stanford email and you should
[00:51:17] on your Stanford email and you should have received it already for course one
[00:51:19] have received it already for course one in order to access the platform from the
[00:51:23] in order to access the platform from the platform you will be able to watch
[00:51:24] platform you will be able to watch videos do quizzes and do programming
[00:51:26] videos do quizzes and do programming assignments and every time we finish one
[00:51:28] assignments and every time we finish one of these courses so c1 has four modules
[00:51:30] of these courses so c1 has four modules when you're at c1 m4 you will receive a
[00:51:33] when you're at c1 m4 you will receive a new invite to access c2 and so on okay
[00:51:37] new invite to access c2 and so on okay inside CS 2:30 we're going to use Casa
[00:51:40] inside CS 2:30 we're going to use Casa as a class forum for you to interact
[00:51:42] as a class forum for you to interact with the TAS and with the instructors
[00:51:44] with the TAS and with the instructors you can post privately or publicly
[00:51:46] you can post privately or publicly depending on the matter okay so let's
[00:51:51] depending on the matter okay so let's see what it is to be one week in the
[00:51:54] see what it is to be one week in the life of a CSC 230 students so we're
[00:51:56] life of a CSC 230 students so we're going to do ten times that over this the
[00:51:58] going to do ten times that over this the Fall Quarter so what is one module in a
[00:52:01] Fall Quarter so what is one module in a module you will watch about ten videos
[00:52:04] module you will watch about ten videos on Coursera which will be about one hour
[00:52:06] on Coursera which will be about one hour and a half you will do quizzes after
[00:52:09] and a half you will do quizzes after watching the videos this is going to
[00:52:12] watching the videos this is going to take you about 20 minutes per module and
[00:52:14] take you about 20 minutes per module and finally you will complete programming
[00:52:16] finally you will complete programming assignments which are on Jupiter
[00:52:18] assignments which are on Jupiter notebooks you will get cells to test
[00:52:22] notebooks you will get cells to test your code and also submit your code
[00:52:23] your code and also submit your code directly on the Coursera platform in one
[00:52:27] directly on the Coursera platform in one week of class in Stanford here we will
[00:52:29] week of class in Stanford here we will have two modules usually on top of these
[00:52:33] have two modules usually on top of these two modules you will come to lecture for
[00:52:36] two modules you will come to lecture for a 1 hour and a half in class lecture on
[00:52:40] a 1 hour and a half in class lecture on an advanced top
[00:52:40] an advanced top that is not taught online and after that
[00:52:43] that is not taught online and after that you will have ta sections on Fridays
[00:52:46] you will have ta sections on Fridays that are around one hour and it's a good
[00:52:48] that are around one hour and it's a good chance for you to meet other students
[00:52:50] chance for you to meet other students for your projects and also to interact
[00:52:52] for your projects and also to interact with the TAS directly finally we have
[00:52:55] with the TAS directly finally we have also personalised monitor ship this
[00:52:58] also personalised monitor ship this quarter where every one of you will meet
[00:53:00] quarter where every one of you will meet 15 minutes per week with the TA in order
[00:53:03] 15 minutes per week with the TA in order to check in on your projects and give
[00:53:05] to check in on your projects and give you the next steps so we put your huge
[00:53:07] you the next steps so we put your huge emphasis on the project in this class
[00:53:09] emphasis on the project in this class and we want you you will see it later to
[00:53:11] and we want you you will see it later to build to the side of your teams by this
[00:53:13] build to the side of your teams by this Friday in order to get started as soon
[00:53:16] Friday in order to get started as soon as possible next week you will have your
[00:53:18] as possible next week you will have your first mentorship meeting with the TAS
[00:53:22] first mentorship meeting with the TAS okay it's gonna be fun
[00:53:25] okay it's gonna be fun assignment and quizzes that are part of
[00:53:28] assignment and quizzes that are part of modules are due every Wednesday at 11
[00:53:31] modules are due every Wednesday at 11 a.m. so 30 minutes before class so you
[00:53:34] a.m. so 30 minutes before class so you can come to class with everything done
[00:53:35] can come to class with everything done and understand it and do not follow the
[00:53:39] and understand it and do not follow the deadlines displayed on the Coursera
[00:53:41] deadlines displayed on the Coursera platform follow the deadlines posted on
[00:53:44] platform follow the deadlines posted on the CS 230 website the reason the
[00:53:46] the CS 230 website the reason the deadlines are different is because we
[00:53:48] deadlines are different is because we want to allow you to have late days and
[00:53:49] want to allow you to have late days and Coursera was not built for late days so
[00:53:52] Coursera was not built for late days so we we put the deadlines later on on
[00:53:54] we we put the deadlines later on on Coursera to allow you to submit even if
[00:53:56] Coursera to allow you to submit even if you want to use a late day does that
[00:53:58] you want to use a late day does that make sense okay
[00:54:00] make sense okay so we're also using a kind of
[00:54:03] so we're also using a kind of interactive this this is gonna start
[00:54:05] interactive this this is gonna start course to we we will use an interactive
[00:54:08] course to we we will use an interactive tool that is called multimeter to check
[00:54:12] tool that is called multimeter to check in attendance in class and also for you
[00:54:14] in attendance in class and also for you to answer some interactive questions so
[00:54:17] to answer some interactive questions so it's gonna start next next week sorry
[00:54:20] it's gonna start next next week sorry not course tone
[00:54:22] not course tone regarding the grading formula here it is
[00:54:26] regarding the grading formula here it is so you have a small part on attendance
[00:54:28] so you have a small part on attendance that is two percent of the final grade
[00:54:30] that is two percent of the final grade eight percent on quizzes 25 percent on
[00:54:33] eight percent on quizzes 25 percent on programming assignments and big part on
[00:54:36] programming assignments and big part on the midterm and on the final projects so
[00:54:40] the midterm and on the final projects so this is posted on the website if you
[00:54:42] this is posted on the website if you want to check it attendance is taken for
[00:54:45] want to check it attendance is taken for in-class lectures for 15 minutes CIA
[00:54:49] in-class lectures for 15 minutes CIA meetings and for the TA sections on
[00:54:51] meetings and for the TA sections on Friday you can have a bonus and
[00:54:54] Friday you can have a bonus and we've had students very active on on
[00:54:56] we've had students very active on on Casa that answered questions to other
[00:54:59] Casa that answered questions to other students which was great and they got a
[00:55:00] students which was great and they got a bonus so I encourage you to do the same
[00:55:03] bonus so I encourage you to do the same maybe we don't need TAS and instructors
[00:55:05] maybe we don't need TAS and instructors there okay so I wanted to take a little
[00:55:11] there okay so I wanted to take a little more time to go over some of the
[00:55:13] more time to go over some of the programming assignments that you're
[00:55:15] programming assignments that you're going to do this quarter so that you you
[00:55:18] going to do this quarter so that you you know where you're going in about three
[00:55:21] know where you're going in about three weeks from how you're going to be able
[00:55:23] weeks from how you're going to be able to translate these pictures here in the
[00:55:26] to translate these pictures here in the numbers that they corresponds to in in
[00:55:28] numbers that they corresponds to in in sign languages so it's sign language
[00:55:29] sign languages so it's sign language trust translation from images to the
[00:55:33] trust translation from images to the output signification you're going to
[00:55:37] output signification you're going to build a convolutional neural network and
[00:55:39] build a convolutional neural network and the first villages segregation and then
[00:55:41] the first villages segregation and then a convolutional neural network in order
[00:55:42] a convolutional neural network in order to solve this problem a little later
[00:55:47] to solve this problem a little later you're going to be a deep learning
[00:55:50] you're going to be a deep learning engineer in a house that is not too far
[00:55:52] engineer in a house that is not too far from here called the happy house so
[00:55:55] from here called the happy house so there's only one rule in this house and
[00:55:57] there's only one rule in this house and the rule is that no sad person should
[00:56:00] the rule is that no sad person should enter the house should avoid that and
[00:56:02] enter the house should avoid that and because you're the only deep learning
[00:56:04] because you're the only deep learning engine that has the knowledge you're
[00:56:05] engine that has the knowledge you're given this task which is don't let these
[00:56:07] given this task which is don't let these sad people in just let happy people in
[00:56:09] sad people in just let happy people in and you're going to build a network that
[00:56:13] and you're going to build a network that will run on a camera that is in front of
[00:56:15] will run on a camera that is in front of the house and that is going to let
[00:56:17] the house and that is going to let people in or not and unfortunately some
[00:56:19] people in or not and unfortunately some people will not get in and other people
[00:56:21] people will not get in and other people will will get in because they're they're
[00:56:23] will will get in because they're they're happy and you will save the happy house
[00:56:26] happy and you will save the happy house at the end of the assignment hopefully
[00:56:29] at the end of the assignment hopefully this is one of the applications of the
[00:56:34] this is one of the applications of the permeate I that I personally prefer its
[00:56:36] permeate I that I personally prefer its called object detection you might have
[00:56:39] called object detection you might have heard of it so this is running real time
[00:56:41] heard of it so this is running real time and that's what is very pressing you're
[00:56:43] and that's what is very pressing you're going to work on a deep learning
[00:56:46] going to work on a deep learning architecture called Yolo v2 and Yolo v2
[00:56:50] architecture called Yolo v2 and Yolo v2 is an object detection algorithm that
[00:56:52] is an object detection algorithm that runs real time and is able to detect
[00:56:54] runs real time and is able to detect 9000 objects as fast as that so it's
[00:56:58] 9000 objects as fast as that so it's it's really really impressive you have a
[00:57:00] it's really really impressive you have a few links here if you want to check the
[00:57:01] few links here if you want to check the paper already but maybe you will need
[00:57:03] paper already but maybe you will need some weights to understand
[00:57:07] okay actually we have a we can even run
[00:57:15] okay actually we have a we can even run it directly on my computer I think she's
[00:57:17] it directly on my computer I think she's going to be fine
[00:57:32] oh yeah we can run it so here you see
[00:57:35] oh yeah we can run it so here you see it's running live on this computer and
[00:57:38] it's running live on this computer and so you see that if I move it will find
[00:57:40] so you see that if I move it will find out that I move so I cannot escape yeah
[00:57:43] out that I move so I cannot escape yeah here it is okay okay a few other
[00:57:52] here it is okay okay a few other projects one two weeks from now you will
[00:57:56] projects one two weeks from now you will build an optimal goalkeeper shoot
[00:57:57] build an optimal goalkeeper shoot prediction so in soccer you're a
[00:57:59] prediction so in soccer you're a goalkeeper and you want to decide where
[00:58:01] goalkeeper and you want to decide where you should shoot the ball in order to
[00:58:03] you should shoot the ball in order to make it land on one of your teammates
[00:58:05] make it land on one of your teammates you're going to find what's the exact
[00:58:08] you're going to find what's the exact line on the field which tells the
[00:58:09] line on the field which tells the goalkeeper were to shoot two weeks from
[00:58:11] goalkeeper were to shoot two weeks from now about in the in the fourth course a
[00:58:15] now about in the in the fourth course a convolutional neural network you're
[00:58:17] convolutional neural network you're going to work on card detection so this
[00:58:18] going to work on card detection so this is a bigger image this is exactly the
[00:58:21] is a bigger image this is exactly the programming assignment so you're going
[00:58:23] programming assignment so you're going to work on the autonomous driving
[00:58:25] to work on the autonomous driving application that is finding cars finding
[00:58:28] application that is finding cars finding stop signs
[00:58:29] stop signs finding lights finding pedestrians and
[00:58:32] finding lights finding pedestrians and all the objects that are related to Road
[00:58:34] all the objects that are related to Road features okay this is pretty cool and
[00:58:36] features okay this is pretty cool and you will generate these images yourself
[00:58:38] you will generate these images yourself so this is a picture taken from a camera
[00:58:41] so this is a picture taken from a camera put in the front of a car and was was
[00:58:46] put in the front of a car and was was generated by dr dot ai you will have a
[00:58:51] generated by dr dot ai you will have a face recognition system that is going to
[00:58:53] face recognition system that is going to first do face verification is this
[00:58:58] first do face verification is this person is this person the right person
[00:59:01] person is this person the right person but also face recognition who is this
[00:59:03] but also face recognition who is this person which is a little more complex
[00:59:04] person which is a little more complex we're going to go over that together
[00:59:06] we're going to go over that together both online and in lecture our
[00:59:09] both online and in lecture our generation some of you have heard of
[00:59:11] generation some of you have heard of this it's an algorithm called neural
[00:59:13] this it's an algorithm called neural side transfer and again we usually put
[00:59:14] side transfer and again we usually put the papers at the bottom of the slides
[00:59:17] the papers at the bottom of the slides in case you want to check in yourself
[00:59:19] in case you want to check in yourself for your project but this is a problem
[00:59:22] for your project but this is a problem where you give a content image which is
[00:59:24] where you give a content image which is the Golden Gate Bridge and a style image
[00:59:26] the Golden Gate Bridge and a style image which is an image that was painted
[00:59:29] which is an image that was painted usually by someone or an image from
[00:59:31] usually by someone or an image from which you want to extract the style this
[00:59:33] which you want to extract the style this algorithm is going to generate a new
[00:59:35] algorithm is going to generate a new image is going to mix the contents of
[00:59:38] image is going to mix the contents of the first image with the style of the
[00:59:40] the first image with the style of the second image music generation which is
[00:59:44] second image music generation which is super fun
[00:59:45] super fun you're going to generate jazz music in
[00:59:48] you're going to generate jazz music in the fifth course sequence models you're
[00:59:51] the fifth course sequence models you're going in the same course also generate
[00:59:54] going in the same course also generate text by giving a huge corpus written by
[00:59:57] text by giving a huge corpus written by Shakespeare a long time ago of poems
[01:00:00] Shakespeare a long time ago of poems you're going to teach the algorithm to
[01:00:01] you're going to teach the algorithm to to generate poems as if it was written
[01:00:05] to generate poems as if it was written by Shakespeare so you can even write the
[01:00:07] by Shakespeare so you can even write the first sentence is going to continue and
[01:00:11] first sentence is going to continue and modify you all have smartphones and I
[01:00:13] modify you all have smartphones and I guess you notice that when you write the
[01:00:15] guess you notice that when you write the sentence on your smartphone it usually
[01:00:18] sentence on your smartphone it usually tells you what you should put next and
[01:00:19] tells you what you should put next and sometimes it's an emoji you're going to
[01:00:21] sometimes it's an emoji you're going to do this part you're going to implement
[01:00:22] do this part you're going to implement the algorithm that takes an input
[01:00:24] the algorithm that takes an input sentence and tells you what's the emoji
[01:00:26] sentence and tells you what's the emoji that that should come after it machine
[01:00:30] that that should come after it machine translation is a is one of the
[01:00:33] translation is a is one of the application that has been tremendously
[01:00:36] application that has been tremendously performing well with deep learning
[01:00:38] performing well with deep learning you're going to implement not a full
[01:00:41] you're going to implement not a full machine translation from one language to
[01:00:43] machine translation from one language to another but a similar task that is as
[01:00:46] another but a similar task that is as exciting which is changing human
[01:00:49] exciting which is changing human readable dates to machine readable days
[01:00:51] readable dates to machine readable days so you know let's say you're you're
[01:00:54] so you know let's say you're you're you're filling in a form and you're
[01:00:55] you're filling in a form and you're typing a date the the entity that that
[01:00:59] typing a date the the entity that that gathers this data will have a hard time
[01:01:01] gathers this data will have a hard time convert all these dates into a specific
[01:01:03] convert all these dates into a specific format you're going to implement the
[01:01:04] format you're going to implement the algorithm that is going to take all
[01:01:06] algorithm that is going to take all these different dates in different
[01:01:07] these different dates in different formats and generate the right format
[01:01:09] formats and generate the right format translate it to human from human
[01:01:11] translate it to human from human readable to machine readable days and
[01:01:14] readable to machine readable days and finally trigger word detection that I
[01:01:16] finally trigger word detection that I also love and and some of you have have
[01:01:19] also love and and some of you have have seen us buildeth algorithm a year ago I
[01:01:22] seen us buildeth algorithm a year ago I believe which was which unison and
[01:01:24] believe which was which unison and Andrew and I have have worked on trigger
[01:01:28] Andrew and I have have worked on trigger word detection is the problem of
[01:01:29] word detection is the problem of detecting a single word so you know you
[01:01:32] detecting a single word so you know you you probably have objects from big
[01:01:35] you probably have objects from big companies that detect the voice and
[01:01:38] companies that detect the voice and activate themselves under a trigger word
[01:01:40] activate themselves under a trigger word you're going to build this algorithm for
[01:01:42] you're going to build this algorithm for the trigger word activate yeah and many
[01:01:47] the trigger word activate yeah and many more projects that you will see now
[01:01:49] more projects that you will see now these are the things that you will all
[01:01:51] these are the things that you will all build in this course every one of you
[01:01:54] build in this course every one of you will be lit through programming
[01:01:55] will be lit through programming assignments but you also have to choose
[01:01:57] assignments but you also have to choose your own projects to work
[01:01:59] your own projects to work throughout the course and these are
[01:02:01] throughout the course and these are example of projects that CS 230 students
[01:02:03] example of projects that CS 230 students have have built in the past and which
[01:02:06] have have built in the past and which have worked very well
[01:02:07] have worked very well one is coloring black-and-white pictures
[01:02:10] one is coloring black-and-white pictures using a neural network into the color
[01:02:13] using a neural network into the color representation of these features so it's
[01:02:14] representation of these features so it's pretty cool because we can now watch
[01:02:16] pretty cool because we can now watch movies that were that were filmed in the
[01:02:20] movies that were that were filmed in the 1930s or 1950s or I don't know when in
[01:02:23] 1930s or 1950s or I don't know when in color which is super cool predicting a
[01:02:27] color which is super cool predicting a price of an object from a picture so
[01:02:29] price of an object from a picture so this was a great project in the first
[01:02:30] this was a great project in the first iteration of 6 to 30 where you give it a
[01:02:33] iteration of 6 to 30 where you give it a bike and then around that 4 it guesses
[01:02:35] bike and then around that 4 it guesses how much is the bike so if you want to
[01:02:39] how much is the bike so if you want to sell stuff you don't know how much you
[01:02:40] sell stuff you don't know how much you just give it then you sell it at under
[01:02:43] just give it then you sell it at under the price the student had actually
[01:02:46] the price the student had actually implemented an algorithm to see which
[01:02:48] implemented an algorithm to see which features of the bike are related to the
[01:02:51] features of the bike are related to the price so it was super fun to see if it's
[01:02:54] price so it was super fun to see if it's the steering wheel or if it's the wheels
[01:02:56] the steering wheel or if it's the wheels or if it's the body of the bike that
[01:02:57] or if it's the body of the bike that makes this bike expensive according to
[01:02:59] makes this bike expensive according to the algorithm and many more so last
[01:03:03] the algorithm and many more so last quarter specifically we've had a lot of
[01:03:05] quarter specifically we've had a lot of projects in physics and an astrophysics
[01:03:09] projects in physics and an astrophysics and chemical engineering and mechanics
[01:03:11] and chemical engineering and mechanics which was great some examples are
[01:03:13] which was great some examples are detecting earthquake precursor signals
[01:03:15] detecting earthquake precursor signals with a sequence model predicting the
[01:03:21] with a sequence model predicting the atom energy based on the atomic
[01:03:22] atom energy based on the atomic structure of an atom so you have you
[01:03:25] structure of an atom so you have you have for instance software's that run
[01:03:27] have for instance software's that run that are really computationally
[01:03:29] that are really computationally expensive that look at the atomic
[01:03:31] expensive that look at the atomic structure of an atom and will output the
[01:03:33] structure of an atom and will output the energy of this atom this takes a long
[01:03:36] energy of this atom this takes a long time these students have tried to make
[01:03:38] time these students have tried to make it a three second problem by running a
[01:03:40] it a three second problem by running a neural network to find the energy of
[01:03:42] neural network to find the energy of yeah - so you have a bunch of problem
[01:03:44] yeah - so you have a bunch of problem across industries so healthcare cancer
[01:03:47] across industries so healthcare cancer parking sonar dimer detection we've had
[01:03:49] parking sonar dimer detection we've had a lot of these we've had brain tumor
[01:03:50] a lot of these we've had brain tumor segmentation segmentation is the problem
[01:03:53] segmentation segmentation is the problem of on an image classify every pixel tell
[01:03:56] of on an image classify every pixel tell me which pixel correspond to the tumor
[01:03:58] me which pixel correspond to the tumor for example so we were really excited to
[01:04:02] for example so we were really excited to see what you guys are going to build at
[01:04:04] see what you guys are going to build at the end of this quarter and that's why
[01:04:06] the end of this quarter and that's why we want you to build your team very
[01:04:08] we want you to build your team very quickly get started because the project
[01:04:10] quickly get started because the project is what you
[01:04:11] is what you be proud of at the end of the quarter we
[01:04:14] be proud of at the end of the quarter we hope that you guys will come at the
[01:04:15] hope that you guys will come at the poster session proud of your poster
[01:04:17] poster session proud of your poster proud of the final project that you sent
[01:04:19] proud of the final project that you sent us and you can talk about it in the 10
[01:04:21] us and you can talk about it in the 10 next years or 20 next year hopefully and
[01:04:25] next years or 20 next year hopefully and I guess Andrew can-can can confirm that
[01:04:27] I guess Andrew can-can can confirm that cs2 to 9 students from the few past
[01:04:30] cs2 to 9 students from the few past years have done projects that are
[01:04:32] years have done projects that are amazing today and have been featured
[01:04:35] amazing today and have been featured around the world in as a researcher or
[01:04:38] around the world in as a researcher or industrial project so to sum up in this
[01:04:41] industrial project so to sum up in this course you will build a wide range of
[01:04:43] course you will build a wide range of applications
[01:04:45] applications it's very applied there is some math but
[01:04:48] it's very applied there is some math but less than she has two to nine more than
[01:04:50] less than she has two to nine more than cs2 to 9a and you have access to
[01:04:53] cs2 to 9a and you have access to personalized mentorship thanks to the
[01:04:56] personalized mentorship thanks to the amazing ta team and the instructors and
[01:05:01] amazing ta team and the instructors and finally we will have to build a 10 week
[01:05:05] finally we will have to build a 10 week long project so now we we get to the
[01:05:09] long project so now we we get to the serious thing what is what we are up to
[01:05:12] serious thing what is what we are up to this week so at the end of every lecture
[01:05:14] this week so at the end of every lecture you'll have one slide that's gonna
[01:05:15] you'll have one slide that's gonna remind you what you have to do for next
[01:05:17] remind you what you have to do for next week next Wednesday 11:00 a.m. so create
[01:05:21] week next Wednesday 11:00 a.m. so create your courser account based on the invite
[01:05:23] your courser account based on the invite that you receive if you didn't receive
[01:05:24] that you receive if you didn't receive an invite
[01:05:26] an invite send it as a private post on Piazza we
[01:05:28] send it as a private post on Piazza we will send it again finish the two first
[01:05:30] will send it again finish the two first modules of course 1 C 1 M 1 and C 1 M 2
[01:05:34] modules of course 1 C 1 M 1 and C 1 M 2 it corresponds to two quizzes and two
[01:05:36] it corresponds to two quizzes and two programming assignments and around 20
[01:05:38] programming assignments and around 20 videos ok which are listed here and for
[01:05:41] videos ok which are listed here and for Friday it means two days from now by the
[01:05:44] Friday it means two days from now by the end of the day fine project team mates
[01:05:47] end of the day fine project team mates and fill in the form to tell us who are
[01:05:51] and fill in the form to tell us who are your teammates it's going to help us
[01:05:52] your teammates it's going to help us find your mentor finally there is a TA
[01:05:57] find your mentor finally there is a TA section also this Friday no project
[01:05:59] section also this Friday no project mentorship it would start next week but
[01:06:01] mentorship it would start next week but we we will see you on Friday I'm gonna
[01:06:04] we we will see you on Friday I'm gonna take a few questions to shout about yes
[01:06:09] take a few questions to shout about yes yeah these times we're going to put be
[01:06:11] yeah these times we're going to put be posted at the end of this time
[01:06:15] posted at the end of this time [Music]
[01:06:16] [Music] so the tail sections we're going to have
[01:06:18] so the tail sections we're going to have a large range of TA section on Friday so
[01:06:20] a large range of TA section on Friday so there's going to be basically every time
[01:06:22] there's going to be basically every time you're going to be assigned to one of
[01:06:23] you're going to be assigned to one of them and if you want to move you can
[01:06:26] them and if you want to move you can send an email as a plat suppose
[01:06:29] send an email as a plat suppose privately to us to be moved to another
[01:06:30] privately to us to be moved to another section how big is the same usually it's
[01:06:37] section how big is the same usually it's from one to three students exceptionally
[01:06:40] from one to three students exceptionally we would accept a four students if the
[01:06:42] we would accept a four students if the project is challenging enough yeah yes
[01:06:49] so it is possible to combine the project
[01:06:52] so it is possible to combine the project with other classes amines been done in
[01:06:54] with other classes amines been done in the past what we want is you to to give
[01:06:59] the past what we want is you to to give a project and a poster that that is
[01:07:02] a project and a poster that that is framed as cs2 30 wants it to different
[01:07:04] framed as cs2 30 wants it to different fabula and you discuss with us in order
[01:07:06] fabula and you discuss with us in order for us to validate if you can merge this
[01:07:08] for us to validate if you can merge this project with another class because it
[01:07:10] project with another class because it requires to have deep learning of course
[01:07:12] requires to have deep learning of course you you're not supposed to combine this
[01:07:14] you you're not supposed to combine this project with something that doesn't have
[01:07:15] project with something that doesn't have the kerning it off
[01:07:17] the kerning it off okay all right one more question so you
[01:07:29] okay all right one more question so you can you can retake the quizzes as much
[01:07:31] can you can retake the quizzes as much as you want on Coursera we will consider
[01:07:34] as you want on Coursera we will consider the last submitted quiz for this class
[01:07:36] the last submitted quiz for this class okay so you can resubmit if you didn't
[01:07:39] okay so you can resubmit if you didn't get full way yeah okay thanks guys and
[01:07:42] get full way yeah okay thanks guys and see you on Friday


================================================================================
LECTURE 002
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 2 - Deep Learning Intuition

Source: https://www.youtube.com/watch?v=AwQHqWyHRpU

---

Transcript

[00:00:05] hello everyone welcome to the second
[00:00:09] hello everyone welcome to the second lecture for
[00:00:09] lecture for yes 2:30 so as I as I said earlier you
[00:00:13] yes 2:30 so as I as I said earlier you can go on Monty comm from your
[00:00:16] can go on Monty comm from your smartphones or your computers and enter
[00:00:18] smartphones or your computers and enter this code 84 5709 we will use this tool
[00:00:22] this code 84 5709 we will use this tool for interactive questions during the
[00:00:24] for interactive questions during the lecture and we will also use it to to
[00:00:27] lecture and we will also use it to to track attendance I'll add it at the end
[00:00:29] track attendance I'll add it at the end of the lecture but if you have time do
[00:00:32] of the lecture but if you have time do it now let's start the lecture or you
[00:00:36] it now let's start the lecture or you guys are doing that ok so today's
[00:00:42] guys are doing that ok so today's lecture is going to be about deep
[00:00:45] lecture is going to be about deep learning intuition and the goal is to
[00:00:47] learning intuition and the goal is to give you a systematic way to think about
[00:00:50] give you a systematic way to think about projects everything related to deep
[00:00:52] projects everything related to deep learning it includes how to collect your
[00:00:54] learning it includes how to collect your data how to label your data how to
[00:00:57] data how to label your data how to choose an architecture but also how to
[00:00:59] choose an architecture but also how to design a proper loss function to
[00:01:01] design a proper loss function to optimize so all these decisions are
[00:01:03] optimize so all these decisions are decisions you are going to have to do
[00:01:05] decisions you are going to have to do during your projects and will try to
[00:01:07] during your projects and will try to give you here an overview of this
[00:01:10] give you here an overview of this systematic way of thinking for different
[00:01:12] systematic way of thinking for different projects it's going to be high-level
[00:01:15] projects it's going to be high-level more than other lectures but we hope it
[00:01:18] more than other lectures but we hope it gives you a good start for your project
[00:01:20] gives you a good start for your project we'll start with a 10 minute recap on
[00:01:22] we'll start with a 10 minute recap on what you've seen in the two first in the
[00:01:25] what you've seen in the two first in the first week about neural networks so as
[00:01:28] first week about neural networks so as you know you can think of machine
[00:01:31] you know you can think of machine learning deep learning in general as
[00:01:32] learning deep learning in general as modeling a function that takes an input
[00:01:34] modeling a function that takes an input that can be an image a speech a natural
[00:01:38] that can be an image a speech a natural language or a CSV file give it to a box
[00:01:43] language or a CSV file give it to a box and get an output that can be
[00:01:45] and get an output that can be classification is it a cat's zero is
[00:01:49] classification is it a cat's zero is there a cat on this image output one or
[00:01:52] there a cat on this image output one or is there no cat on this image output
[00:01:53] is there no cat on this image output zero and I think a good way to remember
[00:01:57] zero and I think a good way to remember what is the model is to define it as
[00:01:59] what is the model is to define it as architecture plus parameters
[00:02:02] architecture plus parameters architecture is the design that you
[00:02:05] architecture is the design that you choose so logistic regression is the
[00:02:07] choose so logistic regression is the first one you've seen you will see
[00:02:09] first one you've seen you will see shallow neural networks deep neural
[00:02:10] shallow neural networks deep neural networks then you will see convolutional
[00:02:12] networks then you will see convolutional neural networks and recurrent neural
[00:02:14] neural networks and recurrent neural networks so these are all types of
[00:02:16] networks so these are all types of architectures and you can choose to make
[00:02:18] architectures and you can choose to make them deeper or shallower parameters are
[00:02:21] them deeper or shallower parameters are the core parts there
[00:02:23] the core parts there numbers that major function take these
[00:02:25] numbers that major function take these cats as input and convert it to an
[00:02:27] cats as input and convert it to an output so these are millions of numbers
[00:02:29] output so these are millions of numbers and the goal of machine learning deep
[00:02:31] and the goal of machine learning deep learning is to find all these numbers so
[00:02:33] learning is to find all these numbers so we're all trying hard to find numbers
[00:02:36] we're all trying hard to find numbers basically millions of numbers in
[00:02:38] basically millions of numbers in matrices if you give this cat and you
[00:02:42] matrices if you give this cat and you forward propagated so we propagate it
[00:02:44] forward propagated so we propagate it through the model to get an output you
[00:02:47] through the model to get an output you will have to compare this output to the
[00:02:49] will have to compare this output to the ground truth the function used to do so
[00:02:52] ground truth the function used to do so is called the loss function you've seen
[00:02:54] is called the loss function you've seen an example of a loss function this week
[00:02:55] an example of a loss function this week that is the logistic loss function we
[00:02:58] that is the logistic loss function we will see more loss functions later on
[00:03:01] will see more loss functions later on computing the gradient of this loss
[00:03:03] computing the gradient of this loss function is going to tell you how much
[00:03:04] function is going to tell you how much should I move my parameters in order to
[00:03:06] should I move my parameters in order to update in order in order to make the
[00:03:09] update in order in order to make the loss go down so in order to make this
[00:03:12] loss go down so in order to make this function recognize cats better than
[00:03:14] function recognize cats better than before you do that many many times until
[00:03:17] before you do that many many times until you find the right parameters to plug in
[00:03:20] you find the right parameters to plug in your architecture you can then give your
[00:03:21] your architecture you can then give your cats and get an output what is very
[00:03:26] cats and get an output what is very interesting in the printing is that many
[00:03:27] interesting in the printing is that many things can change
[00:03:28] things can change you can change the input we talked about
[00:03:30] you can change the input we talked about natural language speech structure and
[00:03:32] natural language speech structure and structured data in general you can
[00:03:34] structured data in general you can change the output it can be a
[00:03:36] change the output it can be a classification algorithm it can be a
[00:03:38] classification algorithm it can be a multi-class algorithm I can ask you give
[00:03:40] multi-class algorithm I can ask you give me the breed of the cat instead of
[00:03:42] me the breed of the cat instead of asking you give me just the cat which
[00:03:44] asking you give me just the cat which makes the problem more complicated it
[00:03:46] makes the problem more complicated it can also be a regression problem I give
[00:03:49] can also be a regression problem I give you the cat and I ask you give me the
[00:03:51] you the cat and I ask you give me the age of the cat she's much more
[00:03:53] age of the cat she's much more complicated again does that make sense
[00:03:57] complicated again does that make sense ok another thing that can change is your
[00:03:59] ok another thing that can change is your architecture we talked about it earlier
[00:04:01] architecture we talked about it earlier and finally the last function I think
[00:04:03] and finally the last function I think the last function is something that that
[00:04:05] the last function is something that that people struggle with to understand what
[00:04:07] people struggle with to understand what cost function to to choose for a
[00:04:10] cost function to to choose for a specific project and we're going to put
[00:04:11] specific project and we're going to put a huge emphasis on that today ok and of
[00:04:16] a huge emphasis on that today ok and of course in the architecture you can
[00:04:18] course in the architecture you can change the activation functions in this
[00:04:20] change the activation functions in this optimization loop you can choose a
[00:04:21] optimization loop you can choose a specific optimizers we're going to see
[00:04:24] specific optimizers we're going to see in about three weeks
[00:04:26] in about three weeks all the optimizers that can be atom
[00:04:27] all the optimizers that can be atom stochastic gradient descent batch
[00:04:29] stochastic gradient descent batch gradient descent rmsprop and momentum
[00:04:31] gradient descent rmsprop and momentum and finally all the hyper parameters
[00:04:34] and finally all the hyper parameters what is the learning rate of this loop
[00:04:36] what is the learning rate of this loop what is the
[00:04:36] what is the that I'm using for my optimization we're
[00:04:38] that I'm using for my optimization we're going to see all that together but
[00:04:39] going to see all that together but there's a bunch of things that can
[00:04:40] there's a bunch of things that can change in this scheme any questions on
[00:04:44] change in this scheme any questions on that in general so far so good okay so
[00:04:52] that in general so far so good okay so let's take the first architecture that
[00:04:54] let's take the first architecture that we've seen together logistic regression
[00:04:56] we've seen together logistic regression as we know an image in computer science
[00:04:58] as we know an image in computer science can be represented by 3d matrix each
[00:05:02] can be represented by 3d matrix each matrix represent a certain color RGB red
[00:05:05] matrix represent a certain color RGB red green blue we can take all these numbers
[00:05:08] green blue we can take all these numbers from these 3d matrix and put it in a
[00:05:11] from these 3d matrix and put it in a vector we flatten it in order to give it
[00:05:13] vector we flatten it in order to give it to our logistic regression we for
[00:05:15] to our logistic regression we for propagate it we multiply it by W which
[00:05:18] propagate it we multiply it by W which is our parameter and B which is our bias
[00:05:21] is our parameter and B which is our bias give it to a sigmoid function and get an
[00:05:22] give it to a sigmoid function and get an output if the network is trained
[00:05:24] output if the network is trained properly we should get a number that is
[00:05:26] properly we should get a number that is more than 0.5 here to tell us that there
[00:05:28] more than 0.5 here to tell us that there is a cut in this image so this is the
[00:05:31] is a cut in this image so this is the basic scale now my question for you is
[00:05:35] basic scale now my question for you is if I want to do the same thing but I
[00:05:39] if I want to do the same thing but I want to have a classifier that can
[00:05:41] want to have a classifier that can classify several animals so on the image
[00:05:44] classify several animals so on the image there could be a giraffe there could be
[00:05:45] there could be a giraffe there could be an elephant or there could be a cat how
[00:05:48] an elephant or there could be a cat how would you modify this architecture yes
[00:06:04] exactly so that's a good point we could
[00:06:06] exactly so that's a good point we could add several units so several neurons one
[00:06:09] add several units so several neurons one for each animal and we will call it
[00:06:11] for each animal and we will call it multi logistic regression so it could be
[00:06:14] multi logistic regression so it could be something like that so we have a fully
[00:06:17] something like that so we have a fully connection here before we were all all
[00:06:19] connection here before we were all all the inputs were connected to this neuron
[00:06:21] the inputs were connected to this neuron and now we added two neurons and each
[00:06:23] and now we added two neurons and each neuron is going to be responsible for
[00:06:25] neuron is going to be responsible for one animal how do we know which neuron
[00:06:27] one animal how do we know which neuron is responsible for which animal is the
[00:06:33] is responsible for which animal is the network going to figure it out on its
[00:06:34] network going to figure it out on its own or do we have to help it
[00:06:43] exactly
[00:06:44] exactly the label is important so what is going
[00:06:47] the label is important so what is going to tell your model this neuron should
[00:06:49] to tell your model this neuron should focus on cat dispersion through focus on
[00:06:51] focus on cat dispersion through focus on elephant dissonance which works on
[00:06:52] elephant dissonance which works on giraffe is the way you label your data
[00:06:54] giraffe is the way you label your data so how should we label this data now if
[00:06:57] so how should we label this data now if we were to do this specific task any
[00:07:06] we were to do this specific task any ideas one harvester okay so one hard
[00:07:12] ideas one harvester okay so one hard vector means a vector with all zeros and
[00:07:14] vector means a vector with all zeros and one one
[00:07:15] one one any other ideas one two three
[00:07:22] any other ideas one two three so I assume you you say that each
[00:07:24] so I assume you you say that each integer would correspond to a circle
[00:07:25] integer would correspond to a circle anymore okay any other ideas modifying
[00:07:38] anymore okay any other ideas modifying the last function you want to put more
[00:07:40] the last function you want to put more weight on one anymore so you modify the
[00:07:42] weight on one anymore so you modify the last function
[00:07:47] I see we don't want hard concretely so I
[00:07:51] I see we don't want hard concretely so I agree with the one hot encoding I think
[00:07:53] agree with the one hot encoding I think there's a downside to do one hot
[00:07:54] there's a downside to do one hot encoding what is the downside of the one
[00:07:56] encoding what is the downside of the one cuts on Cody
[00:08:04] yes so you're saying that the daytime
[00:08:06] yes so you're saying that the daytime though if we have a lot of animals the
[00:08:08] though if we have a lot of animals the data the labels only contain zero and
[00:08:10] data the labels only contain zero and one one so there's a huge imbalance I
[00:08:12] one one so there's a huge imbalance I don't think that's an issue because
[00:08:14] don't think that's an issue because these neurons are independent from each
[00:08:15] these neurons are independent from each other right now so yeah it could run
[00:08:19] other right now so yeah it could run into an issue if you have really a lot
[00:08:21] into an issue if you have really a lot of animals that's true but there is
[00:08:23] of animals that's true but there is another problem with it the problem is
[00:08:25] another problem with it the problem is that do you think if you want if you
[00:08:28] that do you think if you want if you want hot and code your labels you would
[00:08:31] want hot and code your labels you would be able to detect an image with a
[00:08:33] be able to detect an image with a giraffe and an elephant on the image you
[00:08:35] giraffe and an elephant on the image you will not be able to do so you need the
[00:08:38] will not be able to do so you need the multi hots encoding so in this case if
[00:08:41] multi hots encoding so in this case if there is a cat on image I will use a one
[00:08:43] there is a cat on image I will use a one hot I would say 0 1 0 as my label but if
[00:08:47] hot I would say 0 1 0 as my label but if I have a dog and a cat on the image I
[00:08:49] I have a dog and a cat on the image I would say 1 1 0 okay the one hot
[00:08:53] would say 1 1 0 okay the one hot encoding works very well when you have
[00:08:54] encoding works very well when you have the constraint of having only one animal
[00:08:57] the constraint of having only one animal per image and in this case you would not
[00:08:59] per image and in this case you would not use an activation function called
[00:09:01] use an activation function called sigmoid you would use another one which
[00:09:03] sigmoid you would use another one which is softmax yeah the softmax function
[00:09:08] is softmax yeah the softmax function we're going to see together and for
[00:09:10] we're going to see together and for those of you - 2 to 9 you probably heard
[00:09:12] those of you - 2 to 9 you probably heard of it ok so what I wanted to explain
[00:09:15] of it ok so what I wanted to explain here is the way you choose your labeling
[00:09:17] here is the way you choose your labeling is very important and it's a decision
[00:09:18] is very important and it's a decision you should make prior to start the
[00:09:21] you should make prior to start the project ok
[00:09:23] project ok in terms of notation in the in this
[00:09:25] in terms of notation in the in this class we're going to use the following a
[00:09:27] class we're going to use the following a square bracket 1 we denote all the
[00:09:30] square bracket 1 we denote all the activations of the first layer so the
[00:09:32] activations of the first layer so the square bracket we denote the layer and
[00:09:34] square bracket we denote the layer and the lower script we denote the index of
[00:09:38] the lower script we denote the index of the neuron in the layer ok and of course
[00:09:40] the neuron in the layer ok and of course you can stack this neuron on top of each
[00:09:42] you can stack this neuron on top of each other to make the network more complex
[00:09:45] other to make the network more complex and depending on the task you're solving
[00:09:47] and depending on the task you're solving ok
[00:09:49] ok now the concept I wanted to introduce in
[00:09:53] now the concept I wanted to introduce in this recap was the concept of encoding
[00:09:55] this recap was the concept of encoding you probably some of you have probably
[00:09:58] you probably some of you have probably seen this image before if you have a
[00:10:01] seen this image before if you have a network that is not too shallow you
[00:10:05] network that is not too shallow you would notice that what the first neurons
[00:10:07] would notice that what the first neurons see were very precise representation of
[00:10:12] see were very precise representation of the data
[00:10:12] the data so there are pixel level representations
[00:10:14] so there are pixel level representations of the data
[00:10:14] of the data x3i is probably one of the three
[00:10:18] x3i is probably one of the three channels of the 3d matrix just one
[00:10:20] channels of the 3d matrix just one number so what this neuron sees is going
[00:10:23] number so what this neuron sees is going to be a pixel level representation of
[00:10:25] to be a pixel level representation of the image okay what this neuron see is
[00:10:29] the image okay what this neuron see is the second layer the one in the hidden
[00:10:31] the second layer the one in the hidden layer is going to see the representation
[00:10:32] layer is going to see the representation outputted by all the neurons in the
[00:10:35] outputted by all the neurons in the first layer these are going to be more
[00:10:37] first layer these are going to be more high-level more complex because the
[00:10:39] high-level more complex because the first neurons will see pixels they're
[00:10:40] first neurons will see pixels they're gonna output a little more detailed
[00:10:42] gonna output a little more detailed information like I found an edge here I
[00:10:44] information like I found an edge here I found an edge there and so on give it to
[00:10:46] found an edge there and so on give it to the second layer the second layer is
[00:10:48] the second layer the second layer is going to see more complex information
[00:10:49] going to see more complex information it's going to give it to the third layer
[00:10:51] it's going to give it to the third layer which is going to assemble some
[00:10:53] which is going to assemble some high-level complex features that could
[00:10:56] high-level complex features that could be eyes nose mouth depending on what
[00:10:59] be eyes nose mouth depending on what network you've been training so this is
[00:11:01] network you've been training so this is an extraction of what's happening in
[00:11:04] an extraction of what's happening in each layer when the network was trained
[00:11:07] each layer when the network was trained on face recognition yes yes oh I see
[00:11:18] on face recognition yes yes oh I see like give you a fully connected network
[00:11:20] like give you a fully connected network but that's true
[00:11:20] but that's true this type of visuals are more observed
[00:11:24] this type of visuals are more observed in convolutional neural networks because
[00:11:26] in convolutional neural networks because these are filters but this happens also
[00:11:28] these are filters but this happens also in this type of network is just harder
[00:11:30] in this type of network is just harder to visualize okay so this is what we
[00:11:34] to visualize okay so this is what we call an encoding it means if I extract
[00:11:38] call an encoding it means if I extract the information from this layer so all
[00:11:41] the information from this layer so all the numbers that are coming out of these
[00:11:43] the numbers that are coming out of these edges I extract them I will have a
[00:11:45] edges I extract them I will have a complex representation of my input data
[00:11:48] complex representation of my input data if I extract the numbers that are at the
[00:11:51] if I extract the numbers that are at the end of the first layer I will have a
[00:11:52] end of the first layer I will have a lower level representation of my data
[00:11:54] lower level representation of my data that might be edges okay we're going to
[00:11:57] that might be edges okay we're going to use these encoding throughout this
[00:11:59] use these encoding throughout this lecture any questions on that
[00:12:05] okay so let's build intuition on
[00:12:08] okay so let's build intuition on concrete applications we're going to
[00:12:10] concrete applications we're going to start with a short warm-up with the
[00:12:12] start with a short warm-up with the day-night classification and then
[00:12:14] day-night classification and then quickly move to face verification and
[00:12:15] quickly move to face verification and face recognition and after that we'll do
[00:12:18] face recognition and after that we'll do some art generation and finish with a
[00:12:20] some art generation and finish with a trigger word detection if we have time
[00:12:22] trigger word detection if we have time we will talk about how to ship a model
[00:12:24] we will talk about how to ship a model which is shipping architecture plus
[00:12:26] which is shipping architecture plus parameters
[00:12:27] parameters okay we're done fascist as I said on the
[00:12:31] okay we're done fascist as I said on the architecture that lost the training
[00:12:32] architecture that lost the training strategy to help you make decisions
[00:12:34] strategy to help you make decisions during your project so let's start with
[00:12:36] during your project so let's start with the first game we're given an image and
[00:12:39] the first game we're given an image and we have to build a network that tells us
[00:12:42] we have to build a network that tells us if the image is taken during the day
[00:12:45] if the image is taken during the day label zero or was taken at night label
[00:12:49] label zero or was taken at night label one so first question is what data set
[00:12:55] one so first question is what data set do we need to collect okay labeled image
[00:13:07] do we need to collect okay labeled image is captured during the day and during
[00:13:09] is captured during the day and during the night I agree
[00:13:11] the night I agree so probably oh yeah let me ask the
[00:13:14] so probably oh yeah let me ask the question how many images that was wrong
[00:13:17] question how many images that was wrong acting how many images like how do you
[00:13:22] acting how many images like how do you get this number
[00:13:26] can someone give me an estimate of how
[00:13:28] can someone give me an estimate of how many images you need in order to solve
[00:13:30] many images you need in order to solve this problem and explain how you get
[00:13:32] this problem and explain how you get this s true
[00:13:38] so you're saying a number similar to a
[00:13:41] so you're saying a number similar to a number of parameters you have in the
[00:13:42] number of parameters you have in the network so I think it's better to think
[00:13:44] network so I think it's better to think of it the other way around the network
[00:13:46] of it the other way around the network comes after so right now you don't know
[00:13:49] comes after so right now you don't know what network you will use so you cannot
[00:13:50] what network you will use so you cannot decide the number of data points based
[00:13:52] decide the number of data points based on your parameters later on based on how
[00:13:55] on your parameters later on based on how your network is flexible you can add
[00:13:57] your network is flexible you can add more data and that's probably what you
[00:14:00] more data and that's probably what you meant
[00:14:00] meant but at first you want to get you want to
[00:14:02] but at first you want to get you want to get a number yeah more images than
[00:14:08] get a number yeah more images than pixels within an image I I don't think
[00:14:13] pixels within an image I I don't think that that that has anything to do with
[00:14:15] that that that has anything to do with the pixels in image you can have a very
[00:14:16] the pixels in image you can have a very simple task like you have only images
[00:14:19] simple task like you have only images that are red and green and you want to
[00:14:20] that are red and green and you want to classify red and green the image can be
[00:14:23] classify red and green the image can be giant you can have a lot of pixels it's
[00:14:25] giant you can have a lot of pixels it's not gonna change the number of data
[00:14:26] not gonna change the number of data points in it
[00:14:31] okay so you're talking about computation
[00:14:34] okay so you're talking about computation resources so the more images we have
[00:14:36] resources so the more images we have probably the more computation resources
[00:14:38] probably the more computation resources we will need so to me yeah there's
[00:14:40] we will need so to me yeah there's something like that I think in general
[00:14:42] something like that I think in general you want to try to gauge the complexity
[00:14:45] you want to try to gauge the complexity of the task so let's say we did a
[00:14:47] of the task so let's say we did a problem that was cat recognition
[00:14:49] problem that was cat recognition detective there is a cat on an image or
[00:14:51] detective there is a cat on an image or not in this problem we remember that
[00:14:54] not in this problem we remember that with 10,000 images we managed to train a
[00:14:58] with 10,000 images we managed to train a pretty good classifier how do you
[00:15:00] pretty good classifier how do you compare this problem to the cat's
[00:15:02] compare this problem to the cat's problem you think it's easier or harder
[00:15:07] easier yeah I agree that's probably
[00:15:09] easier yeah I agree that's probably easier so in terms of complexity this
[00:15:11] easier so in terms of complexity this task looks less complex than the cat
[00:15:14] task looks less complex than the cat recognition task so you will probably
[00:15:16] recognition task so you will probably need less data that's a rule of thumb
[00:15:18] need less data that's a rule of thumb the second rule of thumb and why I get
[00:15:21] the second rule of thumb and why I get to this image is what do we exactly want
[00:15:23] to this image is what do we exactly want to do do we want to classify pictures
[00:15:25] to do do we want to classify pictures that were taken outside which seems even
[00:15:28] that were taken outside which seems even easier or do we want also the network to
[00:15:31] easier or do we want also the network to classify complicated pictures what what
[00:15:33] classify complicated pictures what what do I mean by complicated pictures inside
[00:15:40] do I mean by complicated pictures inside your house so like let's say on a
[00:15:42] your house so like let's say on a picture you have a window on the right
[00:15:43] picture you have a window on the right side a human will be able to say it's
[00:15:45] side a human will be able to say it's the day because I see the window but for
[00:15:48] the day because I see the window but for the network is going to take a much
[00:15:49] the network is going to take a much longer to learn that much longer than
[00:15:51] longer to learn that much longer than for pictures taken outside what else
[00:15:53] for pictures taken outside what else what are other complicated don't I like
[00:15:59] what are other complicated don't I like sunrise sunset in general it's
[00:16:01] sunrise sunset in general it's complicated because you have to define
[00:16:03] complicated because you have to define it and you have to teach your network
[00:16:05] it and you have to teach your network what what does that mean is it night or
[00:16:07] what what does that mean is it night or day okay so depending on what task you
[00:16:10] day okay so depending on what task you want to solve it's going to tell you if
[00:16:12] want to solve it's going to tell you if you need more data or less data I think
[00:16:14] you need more data or less data I think for this task if you take outside
[00:16:16] for this task if you take outside pictures 10,000 images is going to be
[00:16:18] pictures 10,000 images is going to be enough but if you want the network to
[00:16:21] enough but if you want the network to detect indoor as well you probably need
[00:16:23] detect indoor as well you probably need a hundred thousand images something and
[00:16:25] a hundred thousand images something and this is based on comparing with projects
[00:16:28] this is based on comparing with projects you did in the past so it's going to
[00:16:29] you did in the past so it's going to come with experience now
[00:16:32] come with experience now as you know when you have a dataset you
[00:16:34] as you know when you have a dataset you need to split it between trained
[00:16:35] need to split it between trained validation and test sets some of you
[00:16:37] validation and test sets some of you have heard that we're going to sit
[00:16:38] have heard that we're going to sit together even more you need to train
[00:16:41] together even more you need to train your network on a specific set and test
[00:16:43] your network on a specific set and test another one how do you think you should
[00:16:45] another one how do you think you should split these 10,000 images 50/50 between
[00:16:52] split these 10,000 images 50/50 between training tests 8020 I think we would go
[00:16:56] training tests 8020 I think we would go more towards 8020 because the test set
[00:17:00] more towards 8020 because the test set is made for analyzed to analyze if your
[00:17:03] is made for analyzed to analyze if your network is doing well on real-world data
[00:17:05] network is doing well on real-world data or not I think 2,000 images is enough to
[00:17:08] or not I think 2,000 images is enough to get that sense probably and you want to
[00:17:10] get that sense probably and you want to put complicated examples in this data
[00:17:12] put complicated examples in this data set this way so I would go towards 8020
[00:17:14] set this way so I would go towards 8020 and the bigger the data set the more I
[00:17:16] and the bigger the data set the more I would put in the train set so if I have
[00:17:18] would put in the train set so if I have 1 million images I would put even more
[00:17:21] 1 million images I would put even more like 98% maybe in the train set and 2%
[00:17:24] like 98% maybe in the train set and 2% to test my model okay now I wrote bias
[00:17:28] to test my model okay now I wrote bias here what do I mean by bias yes you need
[00:17:35] here what do I mean by bias yes you need to correct balance between classes you
[00:17:37] to correct balance between classes you don't want to give 9000 dart images in
[00:17:40] don't want to give 9000 dart images in 1,000 day images you want a balance
[00:17:42] 1,000 day images you want a balance between these two to teach your network
[00:17:43] between these two to teach your network to recognize both classes okay what
[00:17:48] to recognize both classes okay what should be the input of your network
[00:17:50] should be the input of your network [Music]
[00:17:56] the pixel image yeah so this is an
[00:17:59] the pixel image yeah so this is an example of a pixel image it's the Louvre
[00:18:00] example of a pixel image it's the Louvre Museum during the day harder question
[00:18:05] Museum during the day harder question what should be the resolution of this
[00:18:07] what should be the resolution of this image and why do we care that's great
[00:18:20] image and why do we care that's great so she said just move it to for SCPD
[00:18:23] so she said just move it to for SCPD students as well as low as you can in
[00:18:26] students as well as low as you can in order to achieve good results why do we
[00:18:29] order to achieve good results why do we want low resolution is because in terms
[00:18:31] want low resolution is because in terms of computation is going to be better
[00:18:33] of computation is going to be better remember if I have a 32 by 32 image how
[00:18:37] remember if I have a 32 by 32 image how many pixels there are if it's color I
[00:18:39] many pixels there are if it's color I have 32 times 32 times 3 if I have 400
[00:18:43] have 32 times 32 times 3 if I have 400 by 400 I have 400 by 400 by 3 it's a lot
[00:18:46] by 400 I have 400 by 400 by 3 it's a lot more so I want to minimize the
[00:18:48] more so I want to minimize the resolution in order to still be able to
[00:18:50] resolution in order to still be able to achieve good performance so what does it
[00:18:53] achieve good performance so what does it mean to still achieve good performance
[00:18:56] mean to still achieve good performance how do I get this number
[00:19:05] okay similar resolution as you expect
[00:19:08] okay similar resolution as you expect the algorithm in real life to work on
[00:19:10] the algorithm in real life to work on yet probably I agree what else what
[00:19:13] yet probably I agree what else what other rule of thumb can you use in order
[00:19:15] other rule of thumb can you use in order to choose this resolution great idea
[00:19:24] to choose this resolution great idea compared to human performance so what I
[00:19:26] compared to human performance so what I do so there's one way to do it which is
[00:19:28] do so there's one way to do it which is the brute force way I would say we will
[00:19:30] the brute force way I would say we will train models on different resolutions
[00:19:32] train models on different resolutions and then compare their results or you
[00:19:34] and then compare their results or you can be smart and use human performance
[00:19:36] can be smart and use human performance as a comparison so I will print this
[00:19:39] as a comparison so I will print this image or several images like this in
[00:19:41] image or several images like this in different resolutions on paper and I
[00:19:43] different resolutions on paper and I would go see humans and say classify
[00:19:45] would go see humans and say classify those classify those and classify those
[00:19:47] those classify those and classify those and I would compare human performance on
[00:19:49] and I would compare human performance on all these three types of resolution in
[00:19:51] all these three types of resolution in order to decide what's the minimum
[00:19:53] order to decide what's the minimum resolution that I can use in order to
[00:19:55] resolution that I can use in order to get perfect human performance so by
[00:19:58] get perfect human performance so by doing that I got that 64 by 64 by 3 was
[00:20:03] doing that I got that 64 by 64 by 3 was enough resolution for a human to detect
[00:20:06] enough resolution for a human to detect if an image is taken during the day or
[00:20:08] if an image is taken during the day or during the night and this is a pretty
[00:20:09] during the night and this is a pretty small resolution in imaging but it seems
[00:20:12] small resolution in imaging but it seems like a small like an easy task if you
[00:20:15] like a small like an easy task if you have to find a breed of a cat you
[00:20:18] have to find a breed of a cat you probably need more because some cats are
[00:20:21] probably need more because some cats are very look very alike and you need a high
[00:20:23] very look very alike and you need a high resolution to distinguish them and maybe
[00:20:25] resolution to distinguish them and maybe training for the human as well I know
[00:20:28] training for the human as well I know only three bits of cat so I wouldn't be
[00:20:29] only three bits of cat so I wouldn't be able to do it anyway what should be the
[00:20:33] able to do it anyway what should be the output of the model labels so y equals
[00:20:39] output of the model labels so y equals zero for day y call one for night I
[00:20:41] zero for day y call one for night I agree what should be the last activation
[00:20:44] agree what should be the last activation of the network the last function sigmoid
[00:20:48] of the network the last function sigmoid we saw that see mo it takes a number
[00:20:50] we saw that see mo it takes a number between plus infinity minus infinity and
[00:20:52] between plus infinity minus infinity and plus infinity puts it between 0 and 1 so
[00:20:54] plus infinity puts it between 0 and 1 so that we can interpret it as a
[00:20:55] that we can interpret it as a probability what architecture would you
[00:20:58] probability what architecture would you use
[00:21:05] fully connected or convolutional I think
[00:21:07] fully connected or convolutional I think later this quarter you will see that
[00:21:09] later this quarter you will see that convolutional perform well in imaging so
[00:21:11] convolutional perform well in imaging so we would directly use a convolution
[00:21:12] we would directly use a convolution writing a shallow Network fully
[00:21:15] writing a shallow Network fully connected or convolutional would do the
[00:21:16] connected or convolutional would do the job pretty well you don't need a deep
[00:21:17] job pretty well you don't need a deep network because you gauge the complexity
[00:21:20] network because you gauge the complexity of this task and what should be the loss
[00:21:23] of this task and what should be the loss function finally the log likelihoods
[00:21:39] function finally the log likelihoods it's also called the logistic loss
[00:21:40] it's also called the logistic loss that's the one you're talking about so
[00:21:42] that's the one you're talking about so the way you get this number and you'll
[00:21:44] the way you get this number and you'll prove it in in CS two to nine we're not
[00:21:46] prove it in in CS two to nine we're not going to prove it here but basically you
[00:21:49] going to prove it here but basically you interpret your data in a probabilistic
[00:21:51] interpret your data in a probabilistic way and you take the maximum likelihood
[00:21:54] way and you take the maximum likelihood estimation of the data which gives you
[00:21:56] estimation of the data which gives you this formula for those of you who did
[00:21:58] this formula for those of you who did the math you can ask in office hours
[00:22:00] the math you can ask in office hours days are going to help you understand it
[00:22:01] days are going to help you understand it more properly okay and of course this
[00:22:05] more properly okay and of course this means that if y equals zero what y hat
[00:22:07] means that if y equals zero what y hat the prediction to be close to zero if y
[00:22:09] the prediction to be close to zero if y call one we want Y hat the prediction to
[00:22:11] call one we want Y hat the prediction to be close to one okay so this was the
[00:22:14] be close to one okay so this was the warm now we're going to delve into face
[00:22:17] warm now we're going to delve into face verification any question on the inline
[00:22:19] verification any question on the inline classification
[00:22:20] classification yes
[00:22:48] so your the question is about how you
[00:22:50] so your the question is about how you choose the size of the test set versus
[00:22:52] choose the size of the test set versus the train set in general you would first
[00:22:55] the train set in general you would first say how many images do I need or data
[00:22:58] say how many images do I need or data points in order to be able to understand
[00:23:00] points in order to be able to understand what my model do in the real world this
[00:23:03] what my model do in the real world this can depend on the task like if I talk
[00:23:05] can depend on the task like if I talk about if I if I tell you about speech
[00:23:07] about if I if I tell you about speech recognition you want to figure out if
[00:23:09] recognition you want to figure out if your model is doing well for all accents
[00:23:11] your model is doing well for all accents in the world
[00:23:12] in the world so your test set might be very big and
[00:23:14] so your test set might be very big and very distributed in this case you might
[00:23:17] very distributed in this case you might have a few examples that are during the
[00:23:19] have a few examples that are during the day few during the night and a few at
[00:23:20] day few during the night and a few at dawn and sunset sunrise and also indoor
[00:23:23] dawn and sunset sunrise and also indoor few of those is going to give you a
[00:23:25] few of those is going to give you a number so there's no good number there
[00:23:27] number so there's no good number there is like you have to gauge it okay one
[00:23:30] is like you have to gauge it okay one more question yeah that's a good
[00:23:35] more question yeah that's a good question
[00:23:35] question so how do you choose the last function
[00:23:37] so how do you choose the last function we're going to see in the next in the
[00:23:41] we're going to see in the next in the next slides how to choose loss functions
[00:23:42] next slides how to choose loss functions but for this one specifically you choose
[00:23:45] but for this one specifically you choose this one because it's a it's a convex
[00:23:47] this one because it's a it's a convex function for classification problem it's
[00:23:50] function for classification problem it's easier to optimize than other loss
[00:23:51] easier to optimize than other loss functions so there is a proof but but I
[00:23:54] functions so there is a proof but but I will not go over it here if you know the
[00:23:57] will not go over it here if you know the l1 loss that compares Y to Y hat this
[00:24:00] l1 loss that compares Y to Y hat this one is harder to optimize for a
[00:24:02] one is harder to optimize for a classification problem we would use it
[00:24:04] classification problem we would use it for regression problems okay so our new
[00:24:08] for regression problems okay so our new game is the school wants to use face
[00:24:10] game is the school wants to use face verification to validate student IDs in
[00:24:14] verification to validate student IDs in facilities like the gym so you know when
[00:24:16] facilities like the gym so you know when you enter the gym you swipe your ID and
[00:24:19] you enter the gym you swipe your ID and then I guess the person sees your face
[00:24:22] then I guess the person sees your face on the screen based on this ID and looks
[00:24:25] on the screen based on this ID and looks at your face in real and compares let's
[00:24:27] at your face in real and compares let's say so now we want to put a camera and
[00:24:30] say so now we want to put a camera and have you swipe and the camera is going
[00:24:34] have you swipe and the camera is going to compare this image to the image in
[00:24:36] to compare this image to the image in the database does that make sense to let
[00:24:38] the database does that make sense to let you in or not so what what data set do
[00:24:42] you in or not so what what data set do we need to solve this problem
[00:24:44] we need to solve this problem what should we collect okay between the
[00:24:54] what should we collect okay between the ID and the image yeah so probably
[00:24:58] ID and the image yeah so probably schools have databases because when you
[00:25:00] schools have databases because when you enter the school you submit your image
[00:25:02] enter the school you submit your image and you're sorry given a card an ID so
[00:25:05] and you're sorry given a card an ID so you have this mapping okay what else
[00:25:08] you have this mapping okay what else doing it so pictures of every student
[00:25:10] doing it so pictures of every student label with their names that's what you
[00:25:11] label with their names that's what you say so this is a picture of birth home
[00:25:13] say so this is a picture of birth home is the picture when he was younger and
[00:25:16] is the picture when he was younger and that's the one he gave to the school
[00:25:17] that's the one he gave to the school when he arrived what should be the input
[00:25:22] when he arrived what should be the input of our model is it this picture more
[00:25:31] of our model is it this picture more photos of him I'm asking just like the
[00:25:33] photos of him I'm asking just like the input of the model like we probably need
[00:25:36] input of the model like we probably need more photos of him as well but what's
[00:25:38] more photos of him as well but what's what's going to be the image we give to
[00:25:40] what's going to be the image we give to the model exactly the person standing in
[00:25:45] the model exactly the person standing in front of the camera when entering the
[00:25:47] front of the camera when entering the gym so this is the entrance of the gym
[00:25:49] gym so this is the entrance of the gym and Bergeron's trying to enter the gym
[00:25:52] and Bergeron's trying to enter the gym so it's him okay what should be the
[00:25:55] so it's him okay what should be the resolution those of you who have done
[00:26:00] resolution those of you who have done projects in imaging what do you think
[00:26:01] projects in imaging what do you think should be the resolution
[00:26:09] in 256 by 256 and your other idea for
[00:26:13] in 256 by 256 and your other idea for free size I think in general you will go
[00:26:18] free size I think in general you will go over 400 so 400 by 400 what's the reason
[00:26:23] over 400 so 400 by 400 what's the reason why do we need 64 for 4 day night and
[00:26:26] why do we need 64 for 4 day night and and 400 for face verification yeah yeah
[00:26:33] and 400 for face verification yeah yeah there's more details to detect so like
[00:26:35] there's more details to detect so like distance between the eyes
[00:26:36] distance between the eyes probably size of the nose mouth general
[00:26:41] probably size of the nose mouth general general features of the face these are
[00:26:43] general features of the face these are harder to detect for a 64 by 64 image
[00:26:45] harder to detect for a 64 by 64 image and you can test it you can go outside
[00:26:48] and you can test it you can go outside and show two pictures of people that
[00:26:51] and show two pictures of people that look like each other and ask people can
[00:26:52] look like each other and ask people can you differentiate those two person or
[00:26:54] you differentiate those two person or not and you'll see that with less than
[00:26:56] not and you'll see that with less than that sometimes it's people are
[00:26:58] that sometimes it's people are struggling is color important that's a
[00:27:02] struggling is color important that's a good question we should have talked
[00:27:03] good question we should have talked about it in day and night actually is
[00:27:04] about it in day and night actually is color important because if you remove
[00:27:06] color important because if you remove the color you basically divide by three
[00:27:08] the color you basically divide by three the number of pixels right so if we
[00:27:11] the number of pixels right so if we could do it without color we would do it
[00:27:13] could do it without color we would do it without color in this case color is
[00:27:15] without color in this case color is going to be important because probably
[00:27:17] going to be important because probably you want your camera to work in
[00:27:19] you want your camera to work in different settings day/night as well so
[00:27:23] different settings day/night as well so the luminosity is different the
[00:27:24] the luminosity is different the brightness and also we all have
[00:27:27] brightness and also we all have different colors and we need to all be
[00:27:28] different colors and we need to all be detected compared to each other I might
[00:27:31] detected compared to each other I might go somewhere in an island and come back
[00:27:33] go somewhere in an island and come back you know full of color but but I still
[00:27:37] you know full of color but but I still want to be able to access the gym
[00:27:40] want to be able to access the gym outputs what should be the output I
[00:27:51] think if you have unlimited
[00:27:54] think if you have unlimited computational power you will take more
[00:27:55] computational power you will take more resolution but that's the trade-off
[00:27:57] resolution but that's the trade-off between computational results so output
[00:28:01] between computational results so output is going to be 1 if it's you and 0 if
[00:28:04] is going to be 1 if it's you and 0 if it's not you in which case they would
[00:28:06] it's not you in which case they would not let you in okay now the question is
[00:28:12] not let you in okay now the question is what architecture should be used to
[00:28:13] what architecture should be used to solve this problem now that we collected
[00:28:15] solve this problem now that we collected the data set of mapping between student
[00:28:18] the data set of mapping between student IDs and images
[00:28:28] you know how do you know how many images
[00:28:31] you know how do you know how many images you need to train the network you don't
[00:28:34] you need to train the network you don't know you can find an estimate it's going
[00:28:36] know you can find an estimate it's going to depend on your architecture but in
[00:28:38] to depend on your architecture but in general the more complex the task the
[00:28:40] general the more complex the task the more data you will need and we will see
[00:28:42] more data you will need and we will see something called
[00:28:42] something called error analysis in about 4 weeks which is
[00:28:45] error analysis in about 4 weeks which is once your network works you're going to
[00:28:48] once your network works you're going to give it a lot of examples detect which
[00:28:50] give it a lot of examples detect which examples are misclassified by your
[00:28:52] examples are misclassified by your network and you're going to add more of
[00:28:54] network and you're going to add more of these in the training set so you're
[00:28:56] these in the training set so you're going to boost your data set ok talking
[00:28:59] going to boost your data set ok talking about the architecture if I ask you
[00:29:01] about the architecture if I ask you what's the easiest way to compare two
[00:29:03] what's the easiest way to compare two images what would you do like these two
[00:29:06] images what would you do like these two images the database image and the input
[00:29:08] images the database image and the input image some sort of hash value means I
[00:29:13] image some sort of hash value means I have chickens
[00:29:16] have chickens standardized functional ok
[00:29:19] standardized functional ok taking him take this run it into a
[00:29:22] taking him take this run it into a specific function take this run it into
[00:29:24] specific function take this run it into a specific function and comparator 2
[00:29:26] a specific function and comparator 2 values that's correct that's a good idea
[00:29:28] values that's correct that's a good idea and the more basic one is just computed
[00:29:30] and the more basic one is just computed distance between the pixels just compute
[00:29:33] distance between the pixels just compute the distance between the pixels and you
[00:29:35] the distance between the pixels and you get if it's the same person or not it
[00:29:36] get if it's the same person or not it doesn't work and a few reasons are the
[00:29:39] doesn't work and a few reasons are the background lighting can be different and
[00:29:40] background lighting can be different and so if I do this - this this pixel which
[00:29:44] so if I do this - this this pixel which is let's say dark is going to have a
[00:29:46] is let's say dark is going to have a value of 0 this pixel which is white is
[00:29:49] value of 0 this pixel which is white is going to have a value of 255 the
[00:29:51] going to have a value of 255 the distance is gigantic but it's still the
[00:29:53] distance is gigantic but it's still the same person it's a problem person can
[00:29:57] same person it's a problem person can wear makeup I can grow there can be
[00:29:59] wear makeup I can grow there can be younger on a picture the ID can be
[00:30:00] younger on a picture the ID can be outdated so it doesn't work to just
[00:30:03] outdated so it doesn't work to just compare these two pictures together we
[00:30:04] compare these two pictures together we need to find a function that we will
[00:30:07] need to find a function that we will apply this these two images to and will
[00:30:09] apply this these two images to and will give us a more a better representation
[00:30:12] give us a more a better representation of the image so that's what we're going
[00:30:16] of the image so that's what we're going to do now what we're going to do is that
[00:30:18] to do now what we're going to do is that will encode information use the encoding
[00:30:21] will encode information use the encoding that we talked about of the picture in
[00:30:23] that we talked about of the picture in the vector so we want a vector that
[00:30:25] the vector so we want a vector that would represent features like distance
[00:30:28] would represent features like distance between eyes nose mouth color all these
[00:30:32] between eyes nose mouth color all these type of stuff hair in a vector so this
[00:30:36] type of stuff hair in a vector so this is the picture of weft Hong from the ID
[00:30:37] is the picture of weft Hong from the ID we would run it to a network and we
[00:30:39] we would run it to a network and we hopefully can find a good encoding of
[00:30:42] hopefully can find a good encoding of this network then we will run the
[00:30:44] this network then we will run the picture of Beth home add the facility
[00:30:46] picture of Beth home add the facility run it in the deep network get another
[00:30:49] run it in the deep network get another vector and hopefully if we train the
[00:30:51] vector and hopefully if we train the network properly these two vector should
[00:30:53] network properly these two vector should be close to each other let's say we have
[00:30:56] be close to each other let's say we have a threshold that is 0.5 0.4 is the
[00:31:00] a threshold that is 0.5 0.4 is the distance between these two it's less
[00:31:01] distance between these two it's less than the threshold so I would say about
[00:31:03] than the threshold so I would say about how is the right person it's you
[00:31:06] how is the right person it's you does this scheme make chain make sense
[00:31:11] what does the 128th vector below so the
[00:31:15] what does the 128th vector below so the question can I say that the third entry
[00:31:17] question can I say that the third entry corresponds to something specific it's
[00:31:19] corresponds to something specific it's complicated to say but depending on what
[00:31:21] complicated to say but depending on what network you choose and the training
[00:31:24] network you choose and the training process you choose it will give you a
[00:31:25] process you choose it will give you a different network a different vector so
[00:31:28] different network a different vector so that's what we're going to talk about
[00:31:29] that's what we're going to talk about now the question is how do I know that
[00:31:31] now the question is how do I know that this vector is good like right now if I
[00:31:34] this vector is good like right now if I take a random network I give my image to
[00:31:36] take a random network I give my image to it it's gonna output around a vector
[00:31:38] it it's gonna output around a vector this vector is not going to contain any
[00:31:39] this vector is not going to contain any useful information I want to make sure
[00:31:41] useful information I want to make sure that this information is useful and
[00:31:43] that this information is useful and that's how I will design my loss
[00:31:46] that's how I will design my loss function ok so just to recap with the
[00:31:50] function ok so just to recap with the other all student faces encoding in a
[00:31:52] other all student faces encoding in a database once we have this and given a
[00:31:55] database once we have this and given a new picture we compute the distance
[00:31:57] new picture we compute the distance between between the new picture and all
[00:31:59] between between the new picture and all the vectors in the database if we find a
[00:32:01] the vectors in the database if we find a match oh sorry we compare this vector of
[00:32:04] match oh sorry we compare this vector of the input image with the vector
[00:32:07] the input image with the vector corresponding to the ID image if it's
[00:32:10] corresponding to the ID image if it's small we consider that is the same
[00:32:11] small we consider that is the same person ok now talking about the loss and
[00:32:14] person ok now talking about the loss and the training to figure out is this
[00:32:16] the training to figure out is this vector corresponds to something
[00:32:18] vector corresponds to something meaningful first we need more data
[00:32:23] meaningful first we need more data because we need our model to understand
[00:32:26] because we need our model to understand in general the features of the face and
[00:32:27] in general the features of the face and a university that has a thousand
[00:32:30] a university that has a thousand students is probably not going to be
[00:32:32] students is probably not going to be enough to have a thousand image in order
[00:32:34] enough to have a thousand image in order to push a model to understand all the
[00:32:36] to push a model to understand all the features of the face instead we will go
[00:32:38] features of the face instead we will go online find open datasets with millions
[00:32:41] online find open datasets with millions of pictures of faces and help the model
[00:32:44] of pictures of faces and help the model learned from these faces to then use it
[00:32:46] learned from these faces to then use it inside the facility
[00:32:47] inside the facility was a question in the back like we did
[00:32:50] was a question in the back like we did with Andrea but every student is a one
[00:32:54] with Andrea but every student is a one that's another option so the question is
[00:32:57] that's another option so the question is why can't you use the one hot encoding
[00:33:00] why can't you use the one hot encoding we could be the classifier that has n
[00:33:04] we could be the classifier that has n output neurons and corresponding to the
[00:33:06] output neurons and corresponding to the number of students in the school and you
[00:33:08] number of students in the school and you take an image you run it to the network
[00:33:11] take an image you run it to the network is going to tell you which student it is
[00:33:12] is going to tell you which student it is what's the issue with that every year
[00:33:15] what's the issue with that every year students enter the school you will have
[00:33:17] students enter the school you will have to modify your network every year
[00:33:19] to modify your network every year because you have more students and you
[00:33:22] because you have more students and you need a higher output vector a larger
[00:33:25] need a higher output vector a larger output vector we don't want to retrain
[00:33:27] output vector we don't want to retrain all the time our networks
[00:33:28] all the time our networks okay so what's what what we really want
[00:33:31] okay so what's what what we really want if we want to put it in words is that oh
[00:33:35] if we want to put it in words is that oh there's a mistake here what we really
[00:33:37] there's a mistake here what we really want is if I give you two pictures of
[00:33:40] want is if I give you two pictures of the same person I want a similar
[00:33:43] the same person I want a similar encoding I want the vector to be similar
[00:33:45] encoding I want the vector to be similar if I give you two pictures of different
[00:33:47] if I give you two pictures of different persons I want different encodings I
[00:33:50] persons I want different encodings I want the vector to be very different and
[00:33:52] want the vector to be very different and we're going to rely on these two
[00:33:55] we're going to rely on these two assumptions and these two dots in order
[00:33:57] assumptions and these two dots in order to generate our last function by giving
[00:34:01] to generate our last function by giving it triplets triplets means three
[00:34:03] it triplets triplets means three pictures one that we call anchor that is
[00:34:06] pictures one that we call anchor that is the person a person one that we call
[00:34:08] the person a person one that we call positive that is the same person as the
[00:34:10] positive that is the same person as the anchor but a different picture of that
[00:34:11] anchor but a different picture of that person and the third one that we call
[00:34:14] person and the third one that we call negative that is a picture of someone
[00:34:16] negative that is a picture of someone else and now what we want to do is to
[00:34:19] else and now what we want to do is to minimize the encoding distance between
[00:34:20] minimize the encoding distance between the anchor and the positive and maximize
[00:34:23] the anchor and the positive and maximize the encoding distance between the anchor
[00:34:24] the encoding distance between the anchor of the neck and the negative thus these
[00:34:27] of the neck and the negative thus these two thoughts make sense so now my
[00:34:30] two thoughts make sense so now my question for you is what should be the
[00:34:33] question for you is what should be the loss function what should be the loss
[00:34:38] loss function what should be the loss function so please go on menti and enter
[00:34:41] function so please go on menti and enter the code and there are three options
[00:34:43] the code and there are three options here a B and C choose which of these you
[00:34:47] here a B and C choose which of these you think should be the right loss function
[00:34:48] think should be the right loss function to use for this problem
[00:34:53] now you have it on your phone as well
[00:34:56] now you have it on your phone as well like issue it small on the screen but
[00:34:59] like issue it small on the screen but you can see it on on it's cut off it's
[00:35:10] you can see it on on it's cut off it's better here
[00:35:13] [Music]
[00:35:23] eight four five seven zero nine can you
[00:35:33] eight four five seven zero nine can you see it on your phone
[00:36:04] so by end of a I mean the encoding
[00:36:08] so by end of a I mean the encoding vector of the anchor my anchor fee I
[00:36:11] vector of the anchor my anchor fee I mean the including vector of the
[00:36:13] mean the including vector of the positive image after you run them to the
[00:36:15] positive image after you run them to the network
[00:36:40] okay 30 more seconds okay all right 20
[00:36:56] okay 30 more seconds okay all right 20 more seconds okay let's see what we have
[00:37:11] okay so two thirds of the people think
[00:37:17] okay so two thirds of the people think that that it's the first answer a so I
[00:37:21] that that it's the first answer a so I read it for everyone
[00:37:23] read it for everyone the last is equal to the l2 distance
[00:37:26] the last is equal to the l2 distance between the encoding of a and the
[00:37:28] between the encoding of a and the encoding of P minus the l2 distance
[00:37:30] encoding of P minus the l2 distance between the encoding of a and the
[00:37:32] between the encoding of a and the encoding of n so someone who has
[00:37:35] encoding of n so someone who has answered this do you want to give a an
[00:37:37] answered this do you want to give a an explanation
[00:37:39] explanation yes we're trying to minimize the first
[00:37:43] yes we're trying to minimize the first difference between a and the positive
[00:37:46] difference between a and the positive and you tend to maximize difference
[00:37:48] and you tend to maximize difference between a and the negative when you
[00:37:50] between a and the negative when you subtract so the second part can be
[00:37:53] subtract so the second part can be responsible to love minimize minimize
[00:37:57] responsible to love minimize minimize yes that's correct so what you said I
[00:38:00] yes that's correct so what you said I repeat it for this video students we
[00:38:02] repeat it for this video students we want to maximize the distance between
[00:38:05] want to maximize the distance between the encoding of a and the encoding of
[00:38:07] the encoding of a and the encoding of the negative that's why we have a minus
[00:38:09] the negative that's why we have a minus sign here because we want the loss to go
[00:38:11] sign here because we want the loss to go down and to go down we put a minus sign
[00:38:13] down and to go down we put a minus sign and we maximize this term and on the
[00:38:16] and we maximize this term and on the other hand we want to minimize the other
[00:38:17] other hand we want to minimize the other term because it's a positive term okay
[00:38:20] term because it's a positive term okay so are you agree we don't sir okay that
[00:38:24] so are you agree we don't sir okay that was the first time you use this tool
[00:38:25] was the first time you use this tool it's gonna be quicker next time okay so
[00:38:28] it's gonna be quicker next time okay so we have we have figured out what the
[00:38:31] we have we have figured out what the last function should be and now think
[00:38:33] last function should be and now think about it
[00:38:33] about it now that we designed our last function
[00:38:35] now that we designed our last function we're able to use an optimization
[00:38:38] we're able to use an optimization algorithm run an image in the network
[00:38:41] algorithm run an image in the network sorry run run
[00:38:43] sorry run run three images in the network like that
[00:38:46] three images in the network like that gets three outputs encoding of a
[00:38:49] gets three outputs encoding of a encoding of T encoding of n compute the
[00:38:52] encoding of T encoding of n compute the loss take the gradients of the loss and
[00:38:54] loss take the gradients of the loss and update the parameters in order to
[00:38:56] update the parameters in order to minimize the loss hopefully after doing
[00:38:59] minimize the loss hopefully after doing that many times we would get an encoding
[00:39:02] that many times we would get an encoding that represents features of the face
[00:39:04] that represents features of the face because the network will have to figure
[00:39:07] because the network will have to figure out who are the same people who are
[00:39:09] out who are the same people who are different people does that make sense
[00:39:11] different people does that make sense this is called the triplet loss and I
[00:39:14] this is called the triplet loss and I cheated a little bit in the in the quiz
[00:39:16] cheated a little bit in the in the quiz I didn't write this alpha the true loss
[00:39:19] I didn't write this alpha the true loss function contains a small alpha you know
[00:39:21] function contains a small alpha you know why yes so you don't have negative loss
[00:39:31] why yes so you don't have negative loss yeah that that's not exactly the role of
[00:39:34] yeah that that's not exactly the role of the Alpha in order to not have negative
[00:39:36] the Alpha in order to not have negative loss what you can do is to use a maximum
[00:39:38] loss what you can do is to use a maximum of the loss and zero and train on the
[00:39:40] of the loss and zero and train on the maximum of the loss and zero but there
[00:39:43] maximum of the loss and zero but there is another reason why we have this alpha
[00:39:45] is another reason why we have this alpha yes which one you prefer based on false
[00:39:55] yes which one you prefer based on false negative unfortunate if no it's not
[00:39:57] negative unfortunate if no it's not about that so sometimes you have an
[00:39:59] about that so sometimes you have an alpha in loss function to put a weight
[00:40:01] alpha in loss function to put a weight on some classes but this is an
[00:40:03] on some classes but this is an additional alpha it's not a
[00:40:04] additional alpha it's not a multiplicative alpha so it has nothing
[00:40:06] multiplicative alpha so it has nothing to do with that yeah to penalize large
[00:40:11] to do with that yeah to penalize large weights are you talking about
[00:40:12] weights are you talking about generalization if we had weights in this
[00:40:16] generalization if we had weights in this formula next to the Alpha like alpha
[00:40:18] formula next to the Alpha like alpha times the norm of the weights this would
[00:40:20] times the norm of the weights this would be regularization but here this term
[00:40:22] be regularization but here this term doesn't penalize weight it's not going
[00:40:28] doesn't penalize weight it's not going to affect the gradient it's not going to
[00:40:30] to affect the gradient it's not going to affect it's not gonna affect the weights
[00:40:31] affect it's not gonna affect the weights but the reason we have it here is
[00:40:34] but the reason we have it here is because let's say the encoding function
[00:40:36] because let's say the encoding function is let's say the encoding function is
[00:40:39] is let's say the encoding function is just a function 0 what we're going to
[00:40:43] just a function 0 what we're going to have is that we're going to have
[00:40:44] have is that we're going to have encoding of a equals 0 minus 0 and here
[00:40:47] encoding of a equals 0 minus 0 and here zero minus zero and so we will have
[00:40:50] zero minus zero and so we will have basically a perfect loss of zero and we
[00:40:56] basically a perfect loss of zero and we still didn't train our network we
[00:40:58] still didn't train our network we just learned affection No so this alpha
[00:41:00] just learned affection No so this alpha is called the margin and it pushes your
[00:41:02] is called the margin and it pushes your network to learn something meaningful in
[00:41:04] network to learn something meaningful in order to to to stabilize itself on on
[00:41:07] order to to to stabilize itself on on zeros okay yeah so it also has to do
[00:41:18] zeros okay yeah so it also has to do with the initializations but because we
[00:41:20] with the initializations but because we didn't talk about initialization yet we
[00:41:22] didn't talk about initialization yet we only saw zero initialization I think and
[00:41:24] only saw zero initialization I think and constellation two together another way
[00:41:27] constellation two together another way to to avoid the network to stabilize to
[00:41:31] to to avoid the network to stabilize to become stable on zero is to change the
[00:41:33] become stable on zero is to change the initialization scheme and in two weeks
[00:41:35] initialization scheme and in two weeks we're going to see different
[00:41:36] we're going to see different initialization schemes together so the
[00:41:51] initialization schemes together so the question is how do we know that this
[00:41:53] question is how do we know that this network is going to be robust to
[00:41:54] network is going to be robust to rotations of the image or scaling of the
[00:41:56] rotations of the image or scaling of the image or translation of the image we
[00:41:59] image or translation of the image we know it's because in the data set we're
[00:42:00] know it's because in the data set we're going to give let's say your picture and
[00:42:03] going to give let's say your picture and your picture scale and we're going to
[00:42:05] your picture scale and we're going to tell the network this is the same person
[00:42:06] tell the network this is the same person so the network will have to learn that
[00:42:09] so the network will have to learn that the scale doesn't mean it's not the same
[00:42:11] the scale doesn't mean it's not the same person you have to learn this feature ok
[00:42:14] person you have to learn this feature ok one more question and then we move on
[00:42:20] yeah so good question why is it a
[00:42:23] yeah so good question why is it a problem to stay it to stabilize at zero
[00:42:26] problem to stay it to stabilize at zero is because it's common to ships and the
[00:42:30] is because it's common to ships and the loss function is positive and in the
[00:42:32] loss function is positive and in the paper that you can find its face net
[00:42:33] paper that you can find its face net paper they don't train exactly this loss
[00:42:35] paper they don't train exactly this loss they train the maximum of this loss and
[00:42:37] they train the maximum of this loss and zero okay so you train and you get the
[00:42:42] zero okay so you train and you get the right function now let's make the
[00:42:44] right function now let's make the problem a little more complicated what
[00:42:46] problem a little more complicated what we did so far was face verification
[00:42:48] we did so far was face verification we're going to do face recognition
[00:42:50] we're going to do face recognition what's the difference the difference is
[00:42:52] what's the difference the difference is there is no more ID so now you just have
[00:42:55] there is no more ID so now you just have a camera in the facility you enter the
[00:42:58] a camera in the facility you enter the camera looks at you and finds you how
[00:43:02] camera looks at you and finds you how would you design this new network
[00:43:13] yes in the back
[00:43:14] yes in the back you've added in an element now of
[00:43:17] you've added in an element now of recognition as well because now before
[00:43:20] recognition as well because now before you'd search stand in front of it and
[00:43:21] you'd search stand in front of it and you that every picture had a face now it
[00:43:23] you that every picture had a face now it needs to detect the face okay so you're
[00:43:26] needs to detect the face okay so you're saying maybe we need to add an element
[00:43:28] saying maybe we need to add an element to the pipeline that is the detection
[00:43:30] to the pipeline that is the detection detection element that's true in general
[00:43:32] detection element that's true in general for face recognition let's say you have
[00:43:34] for face recognition let's say you have a picture that is quite big you want to
[00:43:36] a picture that is quite big you want to use the first Network that identifies
[00:43:38] use the first Network that identifies the face like finds it on the picture
[00:43:40] the face like finds it on the picture detects it and then crop the face and
[00:43:42] detects it and then crop the face and give it to another network that's true
[00:43:44] give it to another network that's true that could also be used in verification
[00:43:46] that could also be used in verification as well great so the difference may be
[00:43:56] as well great so the difference may be weak and what you're saying is maybe we
[00:43:58] weak and what you're saying is maybe we can use or verification algorithm that
[00:44:00] can use or verification algorithm that you trained when instead of looking
[00:44:02] you trained when instead of looking one-to-one comparison we look at 1 to n
[00:44:05] one-to-one comparison we look at 1 to n comparison so we have the pictures of
[00:44:08] comparison so we have the pictures of all the students in the database what we
[00:44:10] all the students in the database what we can do is run all these data based
[00:44:12] can do is run all these data based pictures in the model get a vector that
[00:44:15] pictures in the model get a vector that represents them right to get the vectors
[00:44:18] represents them right to get the vectors now you enter the facility we get your
[00:44:22] now you enter the facility we get your picture we run it through the model we
[00:44:24] picture we run it through the model we get your vector and we can compare this
[00:44:25] get your vector and we can compare this vector to all the vectors in the
[00:44:27] vector to all the vectors in the database to identify you what's the
[00:44:30] database to identify you what's the complexity of this it's the number of
[00:44:36] complexity of this it's the number of students you have for every prediction
[00:44:39] students you have for every prediction to go over the whole database and a
[00:44:41] to go over the whole database and a common network like model that you can
[00:44:44] common network like model that you can use to do that is chain your neighbors
[00:44:47] use to do that is chain your neighbors so of course if you have only one
[00:44:49] so of course if you have only one picture per students it's not going to
[00:44:51] picture per students it's not going to be very precise but if you collect three
[00:44:53] be very precise but if you collect three pictures per student and you run a two
[00:44:55] pictures per student and you run a two nearest neighbors algorithm you would
[00:44:57] nearest neighbors algorithm you would decide that if the two pictures are the
[00:44:59] decide that if the two pictures are the same it's likely that this person is the
[00:45:01] same it's likely that this person is the same as the two person on the picture ok
[00:45:06] same as the two person on the picture ok now let's make it a little more
[00:45:09] now let's make it a little more complicated you probably saw that on
[00:45:11] complicated you probably saw that on your on your phones sometimes you take a
[00:45:15] your on your phones sometimes you take a picture and it recognizes that it's your
[00:45:18] picture and it recognizes that it's your grandmother or your grandfather or your
[00:45:20] grandmother or your grandfather or your mother and father
[00:45:22] mother and father what's happening behind is that there is
[00:45:24] what's happening behind is that there is some clustering happening it means we
[00:45:27] some clustering happening it means we have a bunch of images and we want to
[00:45:31] have a bunch of images and we want to cluster them together so this is also
[00:45:34] cluster them together so this is also another algorithm that you seen here 2
[00:45:35] another algorithm that you seen here 2 to 9 and CF 2 to 9 a which is k-means
[00:45:37] to 9 and CF 2 to 9 a which is k-means algorithm and this is a clustering
[00:45:40] algorithm and this is a clustering algorithm by taking all the vectors that
[00:45:42] algorithm by taking all the vectors that we have in the database we can find
[00:45:45] we have in the database we can find let's say sorry you haven't you have a
[00:45:47] let's say sorry you haven't you have a phone you have thousands of pictures of
[00:45:50] phone you have thousands of pictures of let's say 20 different people what you
[00:45:53] let's say 20 different people what you want is to cluster all the pictures of
[00:45:55] want is to cluster all the pictures of the same person separately what you will
[00:45:58] the same person separately what you will do is that you will encode all the
[00:45:59] do is that you will encode all the pictures in vectors and then you will
[00:46:01] pictures in vectors and then you will run a clustering algorithm like k-means
[00:46:03] run a clustering algorithm like k-means in order to cluster those into groups
[00:46:06] in order to cluster those into groups these are the vectors that look like
[00:46:08] these are the vectors that look like each other these are the vectors that
[00:46:09] each other these are the vectors that look like each other
[00:46:10] look like each other ok and then you can simply give folders
[00:46:13] ok and then you can simply give folders to the users with all the pictures of
[00:46:14] to the users with all the pictures of your mom all the pictures of your dad
[00:46:16] your mom all the pictures of your dad and so good question how do you define
[00:46:24] and so good question how do you define the cake so someone has an idea actually
[00:46:41] so one one way is to as you said to try
[00:46:44] so one one way is to as you said to try different values trainer clustering
[00:46:46] different values trainer clustering algorithm and look at a certain loss to
[00:46:49] algorithm and look at a certain loss to define our small ADIZ there's actually
[00:46:50] define our small ADIZ there's actually an algorithm called X means that is used
[00:46:53] an algorithm called X means that is used X means we might search for that each
[00:46:55] X means we might search for that each one to find to find the K there is also
[00:46:59] one to find to find the K there is also a method called the elbow method and
[00:47:01] a method called the elbow method and that you want to search for as well to
[00:47:03] that you want to search for as well to deter grout the K okay and as you said
[00:47:08] deter grout the K okay and as you said maybe we need to detect the face first
[00:47:10] maybe we need to detect the face first and then crop and give it to the
[00:47:11] and then crop and give it to the algorithm one more question on face
[00:47:13] algorithm one more question on face verification so you can use the music
[00:47:22] verification so you can use the music louder do you need to use the vector
[00:47:32] louder do you need to use the vector that you trained for classification
[00:47:37] sorry idea I don't understand so you
[00:47:40] sorry idea I don't understand so you mean oh so where is the encoding coming
[00:47:52] mean oh so where is the encoding coming from that's what you mean in the network
[00:47:54] from that's what you mean in the network okay good question
[00:47:56] okay good question so you have a deep network and you want
[00:47:58] so you have a deep network and you want to decide where should you take the
[00:47:59] to decide where should you take the encoding from in this case the more
[00:48:02] encoding from in this case the more complex the task the deeper you would go
[00:48:04] complex the task the deeper you would go but for face verification what you want
[00:48:07] but for face verification what you want and you know it as a human you want to
[00:48:08] and you know it as a human you want to know features like distance between eyes
[00:48:11] know features like distance between eyes nose and stuff and so you have to go
[00:48:13] nose and stuff and so you have to go deeper you need the first layers to
[00:48:15] deeper you need the first layers to figure out the edges give the edges to
[00:48:17] figure out the edges give the edges to the second layer the second layer to
[00:48:19] the second layer the second layer to figure out the nose the eyes give it to
[00:48:21] figure out the nose the eyes give it to the third layer the third layer to
[00:48:22] the third layer the third layer to figure out the distances between the
[00:48:23] figure out the distances between the eyes the distance in between years so
[00:48:25] eyes the distance in between years so you would go deeper and get the encoding
[00:48:27] you would go deeper and get the encoding deeper because you know that you want
[00:48:29] deeper because you know that you want high level features okay
[00:48:33] high level features okay our generation even a picture make it
[00:48:38] our generation even a picture make it look beautiful as usual data what do we
[00:48:44] look beautiful as usual data what do we need
[00:48:48] it's a little complicated because we
[00:48:50] it's a little complicated because we have to define what beautiful is so data
[00:48:56] have to define what beautiful is so data some beautiful pictures I know maybe my
[00:48:59] some beautiful pictures I know maybe my concept of beautifully defended they
[00:49:06] concept of beautifully defended they timed a certain style let's go that's a
[00:49:07] timed a certain style let's go that's a good point so we might say that
[00:49:08] good point so we might say that beautiful means paintings like paintings
[00:49:11] beautiful means paintings like paintings are usually beautiful so you wanna have
[00:49:13] are usually beautiful so you wanna have a sigh kind of a style yeah that's true
[00:49:15] a sigh kind of a style yeah that's true so let's say we have any data that we we
[00:49:19] so let's say we have any data that we we want what we're going to do and the way
[00:49:22] want what we're going to do and the way we define this problem is let's take an
[00:49:25] we define this problem is let's take an image that we call the content image and
[00:49:27] image that we call the content image and here again you have the Louvre Museum
[00:49:29] here again you have the Louvre Museum and let's take an image that we call the
[00:49:31] and let's take an image that we call the style image and this is a painting that
[00:49:33] style image and this is a painting that we find beautiful what we want is to
[00:49:37] we find beautiful what we want is to generate an image that looks like it's
[00:49:42] generate an image that looks like it's the content of the content image but
[00:49:45] the content of the content image but painted by the painter of the style so
[00:49:48] painted by the painter of the style so this style image is a clone Monet and
[00:49:50] this style image is a clone Monet and here we have the Louvre painted by
[00:49:52] here we have the Louvre painted by Claude Monet even if he was dead when
[00:49:55] Claude Monet even if he was dead when this pyramid was created so that's our
[00:49:58] this pyramid was created so that's our goal and this is what we would call our
[00:50:02] goal and this is what we would call our generation there are other methods but
[00:50:04] generation there are other methods but this is one so how do we do that what
[00:50:08] this is one so how do we do that what architectures do we need and please try
[00:50:11] architectures do we need and please try to use what we've seen in the past two
[00:50:12] to use what we've seen in the past two applications together what training
[00:50:17] applications together what training scheme what application what what
[00:50:19] scheme what application what what architecture
[00:50:28] one wants to try
[00:50:55] you're saying we take some spy images
[00:50:59] you're saying we take some spy images give it as input to a network and the
[00:51:02] give it as input to a network and the network outputs yes or no like one or
[00:51:05] network outputs yes or no like one or zero generate we want to generate an
[00:51:11] zero generate we want to generate an image probably so what you're proposing
[00:51:24] image probably so what you're proposing is we get an image that is the content
[00:51:28] is we get an image that is the content image and we have a network that is a
[00:51:31] image and we have a network that is a style style network which will style
[00:51:34] style style network which will style this image and we will get the content
[00:51:36] this image and we will get the content but style version of the content so use
[00:51:45] but style version of the content so use certain feature of this type and change
[00:51:47] certain feature of this type and change this style according to what the network
[00:51:49] this style according to what the network is not so this is actually done this is
[00:51:51] is not so this is actually done this is one method that's not the one we'll see
[00:51:53] one method that's not the one we'll see today but this method which is a small
[00:51:56] today but this method which is a small issue is that you have to train your
[00:51:58] issue is that you have to train your network to learn one style network
[00:52:01] network to learn one style network learns one style you give the content it
[00:52:03] learns one style you give the content it gives you the constant with the specific
[00:52:04] gives you the constant with the specific style of the model what we want to do is
[00:52:07] style of the model what we want to do is to have no model that is restricted to a
[00:52:09] to have no model that is restricted to a specific style I want to be able to give
[00:52:12] specific style I want to be able to give a painting of Picasso and get this
[00:52:14] a painting of Picasso and get this picture painted by Picasso so the
[00:52:18] picture painted by Picasso so the difference here is that we're not we're
[00:52:20] difference here is that we're not we're not going to learn parameters of a
[00:52:22] not going to learn parameters of a network like we did for face
[00:52:23] network like we did for face verification or for the in a
[00:52:25] verification or for the in a classification we're going to learn an
[00:52:27] classification we're going to learn an image so remember when we talked about
[00:52:30] image so remember when we talked about back propagation of the gradient to the
[00:52:32] back propagation of the gradient to the parameters we're not going to do that
[00:52:34] parameters we're not going to do that we're going to back propagate all the
[00:52:36] we're going to back propagate all the way back to the image let's see how it
[00:52:39] way back to the image let's see how it works so first we have to understand
[00:52:43] works so first we have to understand what content means and what style means
[00:52:44] what content means and what style means to do that we're going to use encoding
[00:52:47] to do that we're going to use encoding we're going to to use the ideas that we
[00:52:49] we're going to to use the ideas that we talked about later
[00:52:50] talked about later giving the content image to a network
[00:52:53] giving the content image to a network that is very good will allow us to
[00:52:55] that is very good will allow us to extract some information about the
[00:52:57] extract some information about the content of this image we specifically
[00:53:00] content of this image we specifically sew together that earlier layers we
[00:53:02] sew together that earlier layers we detect the edges the edges are usually a
[00:53:05] detect the edges the edges are usually a good representation of the content
[00:53:09] good representation of the content so I might have a very good Network give
[00:53:12] so I might have a very good Network give my contents image extract the
[00:53:14] my contents image extract the information from the first layer this
[00:53:15] information from the first layer this information is going to be the content
[00:53:17] information is going to be the content of the image now the question is how do
[00:53:19] of the image now the question is how do I get the style I want to give my style
[00:53:24] I get the style I want to give my style image and find a way to extract the
[00:53:26] image and find a way to extract the style that's what we're going to learn
[00:53:29] style that's what we're going to learn later in this course it's a technique
[00:53:31] later in this course it's a technique called Graham matrix and the important
[00:53:33] called Graham matrix and the important thing to remember is that the style is
[00:53:35] thing to remember is that the style is non localized information if I show you
[00:53:39] non localized information if I show you the pictures in the previous slide sorry
[00:53:43] the pictures in the previous slide sorry here
[00:53:45] here you see that in the generated picture
[00:53:47] you see that in the generated picture although on the style image there was a
[00:53:49] although on the style image there was a tree on the left side there is no tree
[00:53:52] tree on the left side there is no tree on the generated image it means when I
[00:53:55] on the generated image it means when I extracted the style I just extracted non
[00:53:58] extracted the style I just extracted non localized information what's the
[00:53:59] localized information what's the technique that Claude Monet has used to
[00:54:01] technique that Claude Monet has used to paint I didn't want to extract this tree
[00:54:03] paint I didn't want to extract this tree that was on the style image don't want a
[00:54:06] that was on the style image don't want a content okay so we're going to take a
[00:54:10] content okay so we're going to take a network that understands images very
[00:54:12] network that understands images very well and they're common online you can
[00:54:14] well and they're common online you can find image net classification networks
[00:54:17] find image net classification networks online that were trained to recognize
[00:54:19] online that were trained to recognize more than thousand thousands of objects
[00:54:23] more than thousand thousands of objects this network is going to understand
[00:54:26] this network is going to understand basically anything you give it if I give
[00:54:28] basically anything you give it if I give it the Louvre Museum it's going to find
[00:54:30] it the Louvre Museum it's going to find all the edges very easily it's going to
[00:54:32] all the edges very easily it's going to figure out that there is it's during the
[00:54:34] figure out that there is it's during the day it's going to figure out their
[00:54:36] day it's going to figure out their buildings on the sides and all the
[00:54:37] buildings on the sides and all the features of the image because it was
[00:54:39] features of the image because it was trained for months on thousands of
[00:54:41] trained for months on thousands of classes let's say we have this network
[00:54:44] classes let's say we have this network we give our content image to it and we
[00:54:46] we give our content image to it and we extract information from the first few
[00:54:49] extract information from the first few layers this information we call it
[00:54:51] layers this information we call it content see content of the content image
[00:54:55] content see content of the content image does that make sense now I give the styl
[00:54:59] does that make sense now I give the styl image and I will use another method that
[00:55:01] image and I will use another method that is called the grain matrix to extract
[00:55:03] is called the grain matrix to extract style s style of the style image okay
[00:55:08] style s style of the style image okay and now the question is what should be
[00:55:12] and now the question is what should be the loss function so let's go on menti
[00:55:29] so same code as usual just open it if
[00:55:40] so same code as usual just open it if you want to repeat you can repeat the
[00:55:41] you want to repeat you can repeat the code if you want eight four five seven
[00:55:43] code if you want eight four five seven zero nine and these are the three
[00:55:46] zero nine and these are the three proposals for the last function so
[00:55:49] proposals for the last function so reminder content C means content of the
[00:55:52] reminder content C means content of the contents image style s means style of
[00:55:54] contents image style s means style of the styl image style G means style of
[00:55:58] the styl image style G means style of the generated image content G means
[00:56:00] the generated image content G means content of the generated image take like
[00:56:07] content of the generated image take like a minute it's too small on the code up
[00:56:24] eight four five seven zero nine
[00:56:57] what so just repeating the question why
[00:57:00] what so just repeating the question why do we need to use imagenet because we we
[00:57:03] do we need to use imagenet because we we don't really need to classify any image
[00:57:05] don't really need to classify any image and it's gonna waste time the reason we
[00:57:08] and it's gonna waste time the reason we need image net is because image net
[00:57:10] need image net is because image net understands our pictures so if if you
[00:57:12] understands our pictures so if if you give the contents image to a network
[00:57:15] give the contents image to a network that doesn't understand pictures very
[00:57:16] that doesn't understand pictures very well you're not going to get the edges
[00:57:19] well you're not going to get the edges very well so you want a network that you
[00:57:23] very well so you want a network that you don't care about the classification
[00:57:24] don't care about the classification output you just cut the network in the
[00:57:26] output you just cut the network in the middle extract the layers in the middle
[00:57:28] middle extract the layers in the middle okay let's see what the answers are
[00:57:31] okay let's see what the answers are according to you guys so yeah I repeat
[00:57:41] according to you guys so yeah I repeat we're not training anything here we're
[00:57:43] we're not training anything here we're getting a model that exists and we use
[00:57:46] getting a model that exists and we use this model we're going to talk about the
[00:57:48] this model we're going to talk about the training after okay someone who has
[00:57:51] training after okay someone who has answered the second question and I will
[00:57:53] answered the second question and I will read it out loud the loss is the l2
[00:57:56] read it out loud the loss is the l2 difference between the style of the
[00:57:57] difference between the style of the style image and the generated style plus
[00:58:00] style image and the generated style plus the l2 distance between the generate the
[00:58:03] the l2 distance between the generate the generators content and the contents
[00:58:05] generators content and the contents content yeah
[00:58:16] so yeah we want to minimize both terms
[00:58:19] so yeah we want to minimize both terms here so we want the content of the
[00:58:22] here so we want the content of the content image to look like the content
[00:58:23] content image to look like the content of the generated image so we want to
[00:58:25] of the generated image so we want to minimize the L to this s of these two
[00:58:27] minimize the L to this s of these two and the reason we use a plus is because
[00:58:29] and the reason we use a plus is because we also want to minimize the difference
[00:58:31] we also want to minimize the difference of styles between the generated in the
[00:58:32] of styles between the generated in the style image so you see we don't have any
[00:58:35] style image so you see we don't have any terms that says style of the content
[00:58:37] terms that says style of the content image - style of the generated image is
[00:58:40] image - style of the generated image is minimized this is the loss we want okay
[00:58:46] minimized this is the loss we want okay up below okay so just going over the
[00:58:53] up below okay so just going over the architecture again so the last function
[00:58:56] architecture again so the last function we're going to use will be the one we
[00:58:59] we're going to use will be the one we saw and so one thing that I want to
[00:59:02] saw and so one thing that I want to emphasize here is we're not training the
[00:59:04] emphasize here is we're not training the network there's no parameter that we
[00:59:06] network there's no parameter that we trained the parameters are in the image
[00:59:08] trained the parameters are in the image net classification network we use them
[00:59:10] net classification network we use them we don't train them what we will train
[00:59:12] we don't train them what we will train is the image so you get an image and you
[00:59:15] is the image so you get an image and you start with white noise you run this
[00:59:18] start with white noise you run this image through the classification network
[00:59:20] image through the classification network but you don't care about the
[00:59:22] but you don't care about the classification of this image image net
[00:59:24] classification of this image image net is going to give a random class to this
[00:59:25] is going to give a random class to this image totally random
[00:59:28] image totally random instead you will extract content G and
[00:59:32] instead you will extract content G and tile G okay so from this image you run
[00:59:36] tile G okay so from this image you run it and you extract information from this
[00:59:39] it and you extract information from this network using the same techniques that
[00:59:40] network using the same techniques that you've used to extract content C and
[00:59:43] you've used to extract content C and stylist so contents the N stylist you
[00:59:45] stylist so contents the N stylist you have it you have it you able to compute
[00:59:48] have it you have it you able to compute the last function because now you have
[00:59:50] the last function because now you have the four terms of the class function you
[00:59:53] the four terms of the class function you compute the derivatives instead of
[00:59:55] compute the derivatives instead of stopping in the network you go all the
[00:59:58] stopping in the network you go all the way back to the pixels of the image and
[01:00:00] way back to the pixels of the image and you decide how much should I move the
[01:00:02] you decide how much should I move the pixels in order to make this loss go
[01:00:04] pixels in order to make this loss go down and you do that many times if you
[01:00:07] down and you do that many times if you add many times and the more you do that
[01:00:08] add many times and the more you do that the more this is going to look like the
[01:00:11] the more this is going to look like the content of the content image and the
[01:00:12] content of the content image and the style of the style image yeah yeah so
[01:00:21] style of the style image yeah yeah so the downside of this network is although
[01:00:24] the downside of this network is although it has the flexibility
[01:00:25] it has the flexibility with any style any content every time
[01:00:28] with any style any content every time you want to generate an image you have
[01:00:29] you want to generate an image you have to do this training loop while the other
[01:00:31] to do this training loop while the other network that you talked about doesn't
[01:00:33] network that you talked about doesn't need that because the model is trained
[01:00:34] need that because the model is trained to to convert the content to a style you
[01:00:36] to to convert the content to a style you just give it which network you talked
[01:00:46] just give it which network you talked about this network yeah so do we need to
[01:00:48] about this network yeah so do we need to train this network on Mona images
[01:00:50] train this network on Mona images usually not this network is trained on
[01:00:53] usually not this network is trained on millions of images
[01:00:54] millions of images it's basically seen everything you can
[01:00:57] it's basically seen everything you can imagine what do you mean back propagate
[01:01:06] imagine what do you mean back propagate properly here you're not training the
[01:01:08] properly here you're not training the network you're giving this image
[01:01:10] network you're giving this image computing the back propagation and going
[01:01:12] computing the back propagation and going back to the image only updating the
[01:01:14] back to the image only updating the image you don't update the network it
[01:01:19] image you don't update the network it comes from contents en stylist it comes
[01:01:21] comes from contents en stylist it comes from the stylist so the loss function
[01:01:24] from the stylist so the loss function you bake the baseline is you have
[01:01:26] you bake the baseline is you have content C and style s because you've
[01:01:28] content C and style s because you've chosen a Content picture in a style
[01:01:29] chosen a Content picture in a style picture and now every at every step you
[01:01:32] picture and now every at every step you will find the new content G in style G
[01:01:35] will find the new content G in style G back propagate updates give it again get
[01:01:38] back propagate updates give it again get the new content G and style G update
[01:01:40] the new content G and style G update again and so on no did the art never
[01:01:45] again and so on no did the art never touch it
[01:01:46] touch it just one time the arts image just
[01:01:48] just one time the arts image just touches one time the neural network you
[01:01:50] touches one time the neural network you can you extract style s and then that's
[01:01:51] can you extract style s and then that's all you don't use it again ok let's do
[01:01:54] all you don't use it again ok let's do one more question yeah good question why
[01:02:00] one more question yeah good question why do you start with white nose instead of
[01:02:02] do you start with white nose instead of the content or this time actually do you
[01:02:04] the content or this time actually do you think it's better to start with the
[01:02:05] think it's better to start with the content or this time probably the style
[01:02:09] content or this time probably the style I think probably the content because the
[01:02:13] I think probably the content because the the edges at least look like the content
[01:02:15] the edges at least look like the content is going to to help the network converge
[01:02:19] is going to to help the network converge quicker yeah that's true
[01:02:20] quicker yeah that's true you don't have to start with white noise
[01:02:21] you don't have to start with white noise in generally the baseline is start with
[01:02:23] in generally the baseline is start with white noise so that anything can happen
[01:02:25] white noise so that anything can happen if you give it the content to start with
[01:02:27] if you give it the content to start with is going to have a bias towards the
[01:02:28] is going to have a bias towards the content but if you train longer issues
[01:02:31] content but if you train longer issues okay one more question and then we can
[01:02:40] image doesn't understand what's content
[01:02:42] image doesn't understand what's content and style but imagenet finds the edges
[01:02:45] and style but imagenet finds the edges on the image and so you can give the
[01:02:48] on the image and so you can give the contents image and extract the few first
[01:02:49] contents image and extract the few first layers to get information about them
[01:02:51] layers to get information about them because when it was trained on
[01:02:53] because when it was trained on classification it needed to find the
[01:02:55] classification it needed to find the edges to find that a dog is a dog you
[01:02:58] edges to find that a dog is a dog you first need to find the edges of the log
[01:02:59] first need to find the edges of the log so it's it's trying to do so and for the
[01:03:02] so it's it's trying to do so and for the style it's complicated to understand the
[01:03:05] style it's complicated to understand the style but the network finds all the
[01:03:07] style but the network finds all the features on the image and then we use a
[01:03:09] features on the image and then we use a post processing technique that is called
[01:03:10] post processing technique that is called the Graham matrix in order to extract
[01:03:12] the Graham matrix in order to extract what we call styler it's basically a
[01:03:15] what we call styler it's basically a cross correlation of all the features of
[01:03:17] cross correlation of all the features of the network we will learn it together
[01:03:18] the network we will learn it together later okay let's move on to the next
[01:03:23] later okay let's move on to the next application because we don't have too
[01:03:24] application because we don't have too much time so this is the one I prefer
[01:03:27] much time so this is the one I prefer given a 10 second audio speech detect
[01:03:29] given a 10 second audio speech detect the word activate so you know we talked
[01:03:31] the word activate so you know we talked about trigger word detection and there
[01:03:33] about trigger word detection and there are many companies that have this wake
[01:03:34] are many companies that have this wake word thing where you have a device at
[01:03:36] word thing where you have a device at home and when you say a certain word it
[01:03:38] home and when you say a certain word it activates itself so here's the same
[01:03:40] activates itself so here's the same thing for the word activate what data do
[01:03:42] thing for the word activate what data do we need do we need a lot or not probably
[01:03:51] we need do we need a lot or not probably a lot because there are many accents and
[01:03:53] a lot because there are many accents and one thing that is counterintuitive is
[01:03:54] one thing that is counterintuitive is that if two humans like let's say let's
[01:03:58] that if two humans like let's say let's say - two women speak as a human you
[01:04:02] say - two women speak as a human you would say these voices are are pretty
[01:04:05] would say these voices are are pretty similar right you can detect the word
[01:04:08] similar right you can detect the word what the network's is is a list of
[01:04:12] what the network's is is a list of numbers that are totally different from
[01:04:13] numbers that are totally different from one person to another because the
[01:04:16] one person to another because the frequencies we use in our voices are
[01:04:17] frequencies we use in our voices are totally different from each other so the
[01:04:19] totally different from each other so the numbers are very different although as a
[01:04:21] numbers are very different although as a human we feel that it's very similar so
[01:04:25] human we feel that it's very similar so we need a lot of ten-second audio clips
[01:04:28] we need a lot of ten-second audio clips that's it what should be the
[01:04:31] that's it what should be the distribution it should contain as many
[01:04:33] distribution it should contain as many accents as you can as many female male
[01:04:36] accents as you can as many female male voices kid adults and so on what should
[01:04:41] voices kid adults and so on what should be the input of the network it should be
[01:04:44] be the input of the network it should be a 10 sec
[01:04:45] a 10 sec that we can represent like that the
[01:04:47] that we can represent like that the 10-second audio clip is going to contain
[01:04:49] 10-second audio clip is going to contain some positive words in green positive
[01:04:52] some positive words in green positive word is activate and it's also going to
[01:04:55] word is activate and it's also going to contain negative words in pink like
[01:04:58] contain negative words in pink like kitchen lion whatever words that are not
[01:05:03] kitchen lion whatever words that are not activated and we want only to detect the
[01:05:05] activated and we want only to detect the positive word what should be the sample
[01:05:08] positive word what should be the sample rate again same question you would test
[01:05:11] rate again same question you would test on humans you would you would you would
[01:05:14] on humans you would you would you would also talk to an expert in space regard
[01:05:16] also talk to an expert in space regard mission to know what's the best sample
[01:05:18] mission to know what's the best sample rate to use for speech processing what
[01:05:22] rate to use for speech processing what should be the output any ideas
[01:05:34] okay any other classification yes/no so
[01:05:39] okay any other classification yes/no so 0 or 1 actually let's make your test
[01:05:43] 0 or 1 actually let's make your test let's do it this so we have 3 audio
[01:05:46] let's do it this so we have 3 audio speech here speech 1 speech to speech 3
[01:05:49] speech here speech 1 speech to speech 3 3 I don't know if we have this sound
[01:05:51] 3 I don't know if we have this sound here do we have the sound
[01:05:57] maybe we'll have it now okay let's try
[01:06:06] Maria everybody in the last quality
[01:06:08] Maria everybody in the last quality possibly unable to be manager so this is
[01:06:11] possibly unable to be manager so this is labeled one nobody speaks Italian in the
[01:06:18] labeled one nobody speaks Italian in the second one to the European team of the
[01:06:24] second one to the European team of the apes' engine but then after a shopping
[01:06:29] apes' engine but then after a shopping openness and a spacer
[01:06:31] openness and a spacer okay what's the way quark has anybody
[01:06:36] okay what's the way quark has anybody found what was the the trigger word we
[01:06:40] found what was the the trigger word we need more so you know what's fun is this
[01:06:45] need more so you know what's fun is this is a right scheme to label like it's
[01:06:47] is a right scheme to label like it's definitely possible but it seems that
[01:06:49] definitely possible but it seems that even for humans this labeling scheme is
[01:06:51] even for humans this labeling scheme is super hard we're not able to find what's
[01:06:54] super hard we're not able to find what's what's happening like I don't know even
[01:06:56] what's happening like I don't know even if I did this slide I don't even
[01:06:57] if I did this slide I don't even remember no today now let's try
[01:07:01] remember no today now let's try something else
[01:07:01] something else okay so now we have a different labeling
[01:07:05] okay so now we have a different labeling scheme that tells us also where the wake
[01:07:08] scheme that tells us also where the wake word is happening let's hear it again
[01:07:12] word is happening let's hear it again Maria already in the last koala t
[01:07:15] Maria already in the last koala t possibly label took manager look at our
[01:07:20] possibly label took manager look at our culture somewhat cynically amici Esther
[01:07:23] culture somewhat cynically amici Esther could you tell him I'm being pacified
[01:07:27] could you tell him I'm being pacified with humility put in the shopping open a
[01:07:31] with humility put in the shopping open a service pizza ok what's the trigger word
[01:07:34] service pizza ok what's the trigger word for Murray Joe yeah Paul Mary Jo means
[01:07:37] for Murray Joe yeah Paul Mary Jo means afternoon in Italian so you see what I'm
[01:07:42] afternoon in Italian so you see what I'm trying to illustrate is compare the
[01:07:46] trying to illustrate is compare the human to the computer and you will get
[01:07:48] human to the computer and you will get what's the right labeling scheme to use
[01:07:49] what's the right labeling scheme to use and of course the labeling scheme here
[01:07:52] and of course the labeling scheme here is going to be better for the model
[01:07:54] is going to be better for the model rather than the first one and we just
[01:07:56] rather than the first one and we just proved it the important thing is to know
[01:08:00] proved it the important thing is to know that the first one would also work we
[01:08:02] that the first one would also work we just need a ton of data we need a lot
[01:08:05] just need a ton of data we need a lot more data to make the first labeling
[01:08:06] more data to make the first labeling scheme work than we need for the second
[01:08:08] scheme work than we need for the second one does that make sense
[01:08:11] one does that make sense so yeah we will use something like that
[01:08:19] good question actually this is not the
[01:08:22] good question actually this is not the best labeling scheme as you said should
[01:08:25] best labeling scheme as you said should the one come before or after the word
[01:08:27] the one come before or after the word was said what do you guys think
[01:08:30] was said what do you guys think before after yeah you will see that
[01:08:34] before after yeah you will see that recurrent neural networks are going
[01:08:37] recurrent neural networks are going basically to look at the data just as
[01:08:40] basically to look at the data just as human do like temporally from the
[01:08:42] human do like temporally from the beginning to the end and in this case
[01:08:44] beginning to the end and in this case you need to hear the word in order to
[01:08:46] you need to hear the word in order to detect it so we're going to put the one
[01:08:48] detect it so we're going to put the one right after the word was set another
[01:08:50] right after the word was set another issue that we have with this is that
[01:08:52] issue that we have with this is that there are too many zeros it's highly
[01:08:54] there are too many zeros it's highly unbalanced so the network is pushed to
[01:08:56] unbalanced so the network is pushed to always predict zeros so what we do as a
[01:08:58] always predict zeros so what we do as a hack and there's a lot of hacks like
[01:09:00] hack and there's a lot of hacks like that happening in papers if you read
[01:09:02] that happening in papers if you read them we're going to add several ones
[01:09:03] them we're going to add several ones after the word we'll say I would add
[01:09:06] after the word we'll say I would add twenty ones basically okay so this is
[01:09:10] twenty ones basically okay so this is our labeling scheme now what should be
[01:09:13] our labeling scheme now what should be the last activation of our network
[01:09:22] sigmoid function yeah sigmoid but
[01:09:25] sigmoid function yeah sigmoid but sequential for every time step you would
[01:09:28] sequential for every time step you would use a sigmoid to output 0 or 1 basically
[01:09:31] use a sigmoid to output 0 or 1 basically don't worry if you don't understand
[01:09:33] don't worry if you don't understand specifically what networks were using
[01:09:35] specifically what networks were using you're going to learn it in a few weeks
[01:09:37] you're going to learn it in a few weeks so the architecture should should be
[01:09:39] so the architecture should should be like a recurrent neural network probably
[01:09:42] like a recurrent neural network probably convolutional networks might work as
[01:09:44] convolutional networks might work as well we'll see it later on in the course
[01:09:47] well we'll see it later on in the course and the loss function should be the same
[01:09:49] and the loss function should be the same as before but we should make it
[01:09:50] as before but we should make it sequential for every time step we should
[01:09:52] sequential for every time step we should use the loss function like that and we
[01:09:54] use the loss function like that and we should sum them over all the time step
[01:09:57] should sum them over all the time step sounds good so another insight on this
[01:10:02] sounds good so another insight on this project I'll take it out is what was
[01:10:05] project I'll take it out is what was critical to the success of this project
[01:10:07] critical to the success of this project I think there are two things that are
[01:10:08] I think there are two things that are really critical when you when you build
[01:10:10] really critical when you when you build such a project the first one is to have
[01:10:13] such a project the first one is to have a straight strategic data acquisition
[01:10:16] a straight strategic data acquisition pipeline so let's talk more about that
[01:10:19] pipeline so let's talk more about that we said that our data should be
[01:10:21] we said that our data should be 10-second audio clips that contain
[01:10:23] 10-second audio clips that contain positive and negative words from many
[01:10:25] positive and negative words from many different accents
[01:10:27] different accents how would you collect this data
[01:10:38] right
[01:10:42] yes you said you paid people to give you
[01:10:50] yes you said you paid people to give you ten seconds of their voice but yes I
[01:10:55] ten seconds of their voice but yes I think you you can take your phone go
[01:10:57] think you you can take your phone go around campus and that's actually how we
[01:10:59] around campus and that's actually how we did it we took our phones we went around
[01:11:02] did it we took our phones we went around campus and we got some audio recordings
[01:11:04] campus and we got some audio recordings so one way to do it is that's to go and
[01:11:07] so one way to do it is that's to go and get ten second audio recordings from
[01:11:09] get ten second audio recordings from different people with a large
[01:11:11] different people with a large distribution of accents and then what do
[01:11:13] distribution of accents and then what do you do you label you label by hand
[01:11:16] you do you label you label by hand that's one method is it long or short is
[01:11:20] that's one method is it long or short is it is it quick or not it's super slow
[01:11:23] it is it quick or not it's super slow yeah oh
[01:11:27] subtitles in movies alright that's a
[01:11:29] subtitles in movies alright that's a good idea actually you could like based
[01:11:32] good idea actually you could like based on the licensing of the movie you could
[01:11:36] on the licensing of the movie you could like take an audio from a movie and you
[01:11:39] like take an audio from a movie and you get the subtitles and you're looking for
[01:11:41] get the subtitles and you're looking for activate and every time the subtitles
[01:11:43] activate and every time the subtitles they activate you could label your data
[01:11:45] they activate you could label your data that's super fun that's a pretty good
[01:11:47] that's super fun that's a pretty good actually you could label automatically
[01:11:49] actually you could label automatically using that yeah so that's a good idea I
[01:11:52] using that yeah so that's a good idea I think there's another way to do it that
[01:11:54] think there's another way to do it that is closer to that which is we're going
[01:11:56] is closer to that which is we're going to collect three databases the first one
[01:11:59] to collect three databases the first one is going to be the positive word
[01:12:01] is going to be the positive word database the second one is going to be
[01:12:03] database the second one is going to be the negative word database the third one
[01:12:05] the negative word database the third one is going to be the background noise
[01:12:07] is going to be the background noise database so I take the background ten
[01:12:12] database so I take the background ten seconds I insert randomly from one two
[01:12:16] seconds I insert randomly from one two three negative words and I insert
[01:12:18] three negative words and I insert randomly from one two three positive
[01:12:20] randomly from one two three positive words making sure it doesn't overlap
[01:12:23] words making sure it doesn't overlap with a negative word okay
[01:12:26] with a negative word okay what's the main advantage of this method
[01:12:31] programmatic generation examples yeah
[01:12:34] programmatic generation examples yeah programmatic generation of samples and
[01:12:35] programmatic generation of samples and automated labeling I tend label I know
[01:12:39] automated labeling I tend label I know where I inserted my positive words so I
[01:12:42] where I inserted my positive words so I just add ones where I inserted it I can
[01:12:45] just add ones where I inserted it I can generate millions of data examples like
[01:12:47] generate millions of data examples like that just because I found the right
[01:12:49] that just because I found the right strategy to to create data you see the
[01:12:52] strategy to to create data you see the difference between the two methods the
[01:12:53] difference between the two methods the one where you have to go out and
[01:12:56] one where you have to go out and collect data and the one where you just
[01:12:58] collect data and the one where you just go out collect positive words negative
[01:13:01] go out collect positive words negative words and then find background noise on
[01:13:03] words and then find background noise on YouTube or wherever you have the right
[01:13:05] YouTube or wherever you have the right license to use it's it's a big
[01:13:08] license to use it's it's a big difference and this can make can make a
[01:13:11] difference and this can make can make a company succeed compared to another
[01:13:12] company succeed compared to another company
[01:13:13] company it's very common okay so I would go on
[01:13:16] it's very common okay so I would go on campus take one second audio clips of
[01:13:19] campus take one second audio clips of positive words put it in the database in
[01:13:21] positive words put it in the database in green take one second audio clips of
[01:13:23] green take one second audio clips of negative words of the same people as
[01:13:25] negative words of the same people as well put it in the pink database and get
[01:13:28] well put it in the pink database and get background noise from anywhere I can
[01:13:29] background noise from anywhere I can find it it's very cheap and then create
[01:13:31] find it it's very cheap and then create the synthetic data label it
[01:13:33] the synthetic data label it automatically and you know with like
[01:13:36] automatically and you know with like five plus Z words five negative words
[01:13:39] five plus Z words five negative words five backgrounds you can create a lot of
[01:13:41] five backgrounds you can create a lot of data points okay
[01:13:45] data points okay so this is an important technique that
[01:13:47] so this is an important technique that you might want to think about in your
[01:13:48] you might want to think about in your project the second thing that is
[01:13:51] project the second thing that is important for the success of such a
[01:13:53] important for the success of such a project is the architecture search and
[01:13:55] project is the architecture search and hyper parameter tuning so all of you you
[01:13:58] hyper parameter tuning so all of you you will have complicated projects where you
[01:14:01] will have complicated projects where you would be lost
[01:14:02] would be lost regarding the inker architecture to use
[01:14:05] regarding the inker architecture to use at first it's a complicated process to
[01:14:08] at first it's a complicated process to find the architecture but you should not
[01:14:10] find the architecture but you should not give up and the first thing I would say
[01:14:11] give up and the first thing I would say is talk to the experts so let me tell
[01:14:14] is talk to the experts so let me tell you the story of this project first I I
[01:14:18] you the story of this project first I I started like looking at the literature
[01:14:22] started like looking at the literature and figuring out what network I could
[01:14:24] and figuring out what network I could use for this project and I ended up
[01:14:26] use for this project and I ended up using that for the beginning part I use
[01:14:28] using that for the beginning part I use a Fourier transform to extract features
[01:14:30] a Fourier transform to extract features from the speech who's familiar with
[01:14:32] from the speech who's familiar with spectrograms or Fourier transforms so
[01:14:35] spectrograms or Fourier transforms so for the others think about audio speech
[01:14:37] for the others think about audio speech as a 1d signal but every one this signal
[01:14:40] as a 1d signal but every one this signal can be decomposed in a sum of sines and
[01:14:43] can be decomposed in a sum of sines and cosines with a specific frequency and
[01:14:45] cosines with a specific frequency and amplitude for each of these and so I can
[01:14:48] amplitude for each of these and so I can convert a 1d signal into a matrix for
[01:14:51] convert a 1d signal into a matrix for with with with basically
[01:14:59] basically one axis that is the frequency
[01:15:02] basically one axis that is the frequency one axis that is the time going from
[01:15:06] one axis that is the time going from going from zero to ten seconds and I
[01:15:11] going from zero to ten seconds and I will get the value of all the the
[01:15:14] will get the value of all the the amplitude of this frequency so maybe
[01:15:16] amplitude of this frequency so maybe this one is a strong frequency this one
[01:15:18] this one is a strong frequency this one is a strong frequency this one is a low
[01:15:20] is a strong frequency this one is a low one and so on for every time step this
[01:15:22] one and so on for every time step this is a spectrogram of an audio speech
[01:15:25] is a spectrogram of an audio speech you're going to learn a little bit more
[01:15:26] you're going to learn a little bit more about that so after I got the
[01:15:28] about that so after I got the spectrogram which is better than the 1d
[01:15:29] spectrogram which is better than the 1d signal for the network I would use an
[01:15:32] signal for the network I would use an LSD M which is a recurrent neural
[01:15:34] LSD M which is a recurrent neural network and add a sigmoid layer after it
[01:15:37] network and add a sigmoid layer after it to get probabilities between zero and
[01:15:39] to get probabilities between zero and one I would threshold them everything be
[01:15:42] one I would threshold them everything be more than 0.5 I would consider that it's
[01:15:45] more than 0.5 I would consider that it's a 1 everything last to zero I tried for
[01:15:48] a 1 everything last to zero I tried for a long time fitting this network on the
[01:15:51] a long time fitting this network on the data it didn't work but one day I was
[01:15:53] data it didn't work but one day I was working on campus and I I found a friend
[01:15:58] working on campus and I I found a friend that was an expert in speech recognition
[01:16:00] that was an expert in speech recognition he has worked a lot on all these
[01:16:02] he has worked a lot on all these problems and he exactly knew that this
[01:16:04] problems and he exactly knew that this was not going to work he could told me
[01:16:05] was not going to work he could told me he could have told me so he told me
[01:16:08] he could have told me so he told me there are several issues with this
[01:16:10] there are several issues with this network the first one is your hyper
[01:16:14] network the first one is your hyper parameters in the Fourier transform
[01:16:15] parameters in the Fourier transform they're wrong go on my github you will
[01:16:18] they're wrong go on my github you will find what hyper parameters are used for
[01:16:20] find what hyper parameters are used for this Fourier transform you will find
[01:16:21] this Fourier transform you will find specifically what sample rate what's
[01:16:24] specifically what sample rate what's window size what frequencies are used so
[01:16:27] window size what frequencies are used so that was better then he said one issue
[01:16:29] that was better then he said one issue is that your record neural network is
[01:16:32] is that your record neural network is too big it's super hard to train instead
[01:16:34] too big it's super hard to train instead you should reduce it so I've used so he
[01:16:37] you should reduce it so I've used so he told me to use a convolution to reduce
[01:16:39] told me to use a convolution to reduce the number of time steps of my audio
[01:16:41] the number of time steps of my audio clip you will learn about all these
[01:16:42] clip you will learn about all these layers later and also use batch noir
[01:16:46] layers later and also use batch noir which is a specific type of layer that
[01:16:48] which is a specific type of layer that that makes the training easier and
[01:16:50] that makes the training easier and finally you get your sigmoid layer and
[01:16:53] finally you get your sigmoid layer and you output zeros and ones but because
[01:16:56] you output zeros and ones but because the output time steps is smaller than
[01:17:01] the output time steps is smaller than the input you have to expand it so you
[01:17:03] the input you have to expand it so you need an expansion algorithm just a
[01:17:05] need an expansion algorithm just a script that expands every zero in two
[01:17:07] script that expands every zero in two zeros let's say every one in two ones
[01:17:09] zeros let's say every one in two ones and so on and now I get another
[01:17:11] and so on and now I get another architecture
[01:17:12] architecture that I managed to train within a day and
[01:17:14] that I managed to train within a day and this was all because I was lucky enough
[01:17:17] this was all because I was lucky enough to find the experts and get advice from
[01:17:21] to find the experts and get advice from this person so I think you will run into
[01:17:23] this person so I think you will run into the same problems as I run into during
[01:17:25] the same problems as I run into during your projects
[01:17:26] your projects the important thing is spend more time
[01:17:29] the important thing is spend more time figuring out who is the expert and who
[01:17:30] figuring out who is the expert and who can tell you the answer rather than
[01:17:32] can tell you the answer rather than trying out random things I think this is
[01:17:35] trying out random things I think this is a an important thing to think about okay
[01:17:39] a an important thing to think about okay so don't give up and also use error
[01:17:42] so don't give up and also use error analysis which we're going to see later
[01:17:44] analysis which we're going to see later we have two more minutes so I'm not
[01:17:46] we have two more minutes so I'm not going to go over this one I'm just going
[01:17:48] going to go over this one I'm just going to talk about it quickly there's another
[01:17:49] to talk about it quickly there's another way to solve a word detection and the
[01:17:52] way to solve a word detection and the other way is to use the triplet loss
[01:17:54] other way is to use the triplet loss algorithm instead of using anchor
[01:17:56] algorithm instead of using anchor positive and negative faces you can use
[01:17:58] positive and negative faces you can use audio speech of one second anchor is the
[01:18:01] audio speech of one second anchor is the word activate positive is the word
[01:18:05] word activate positive is the word activate said differently and negative
[01:18:07] activate said differently and negative is another word you will train your
[01:18:10] is another word you will train your network to encode activate in a certain
[01:18:13] network to encode activate in a certain vector and then compare the distance
[01:18:16] vector and then compare the distance between vectors to figure out this
[01:18:17] between vectors to figure out this activate is present or not okay we have
[01:18:21] activate is present or not okay we have about two more minutes so I'm going to
[01:18:27] my back
[01:18:30] does that me so just to finish with two
[01:18:36] does that me so just to finish with two more slides now that you've seen some
[01:18:38] more slides now that you've seen some last function I want to show you another
[01:18:40] last function I want to show you another one and I want you to tell me what
[01:18:44] one and I want you to tell me what application does this beautiful Las
[01:18:46] application does this beautiful Las correspond to this one of the most
[01:18:49] correspond to this one of the most beautiful la sigh I've seen in my life
[01:18:54] so someone can tell me what's the
[01:18:57] so someone can tell me what's the application what problem are we trying
[01:18:58] application what problem are we trying to solve if we use this loss function
[01:19:07] speech recognition no it's not looking
[01:19:09] speech recognition no it's not looking good try yes regression that's true it's
[01:19:15] good try yes regression that's true it's a regression problem but it's a specific
[01:19:17] a regression problem but it's a specific regression problem bounding box good
[01:19:22] regression problem bounding box good bounding boxes object detection this is
[01:19:24] bounding boxes object detection this is object detection so I put the paper here
[01:19:27] object detection so I put the paper here you can check it out but how do you know
[01:19:29] you can check it out but how do you know that it's subject detection oh you've
[01:19:33] that it's subject detection oh you've done it before okay so this is the loss
[01:19:39] done it before okay so this is the loss function of a network called Yolo and
[01:19:42] function of a network called Yolo and the reason you can find out these
[01:19:44] the reason you can find out these bounding boxes is because if you look at
[01:19:45] bounding boxes is because if you look at the first term you would see that it's
[01:19:48] the first term you would see that it's comparing X 2 through X predicted X 2
[01:19:51] comparing X 2 through X predicted X 2 print 2 through X predicted Y 2 true Y
[01:19:54] print 2 through X predicted Y 2 true Y this is the center of a bounding box X Y
[01:19:57] this is the center of a bounding box X Y second term is W and H w ni H stands for
[01:20:01] second term is W and H w ni H stands for width and height of a bounding box and
[01:20:03] width and height of a bounding box and it's trying to minimize the distance
[01:20:06] it's trying to minimize the distance between the true bounding box and the
[01:20:09] between the true bounding box and the predicted bounding box basically the
[01:20:11] predicted bounding box basically the third term has an idle indicator
[01:20:13] third term has an idle indicator function with objects it's saying if
[01:20:15] function with objects it's saying if there is an object you should have a
[01:20:17] there is an object you should have a high probability of object miss the
[01:20:20] high probability of object miss the fourth term is saying that if there is
[01:20:22] fourth term is saying that if there is no object you should have a lower
[01:20:24] no object you should have a lower probability of object miss and finally
[01:20:27] probability of object miss and finally the final term is telling you you have
[01:20:29] the final term is telling you you have to find the class that is in this box is
[01:20:31] to find the class that is in this box is it a cat is the dog is it an elephant is
[01:20:34] it a cat is the dog is it an elephant is whatever so this is an object detection
[01:20:36] whatever so this is an object detection loss function
[01:20:38] loss function actually do you know why why you would
[01:20:40] actually do you know why why you would have a square root here hmm
[01:20:46] have a square root here hmm what's the TV know except for Dex
[01:20:50] what's the TV know except for Dex the reason we have the square root is
[01:20:52] the reason we have the square root is because if you want to penalize more
[01:20:56] because if you want to penalize more errors on small bounding boxes rather
[01:20:58] errors on small bounding boxes rather than big bounding boxes so if I give you
[01:21:00] than big bounding boxes so if I give you an image of a human like that and
[01:21:04] an image of a human like that and they're cats like this you can have so
[01:21:09] they're cats like this you can have so this box the one inside is the ground
[01:21:11] this box the one inside is the ground truth is a very tight box this one same
[01:21:14] truth is a very tight box this one same and the box that are predicted or the
[01:21:18] and the box that are predicted or the predictions so these are the predictions
[01:21:19] predictions so these are the predictions and the other ones are the ground truth
[01:21:22] and the other ones are the ground truth what's interesting is that a two pixel
[01:21:24] what's interesting is that a two pixel error on these cats is much more
[01:21:29] error on these cats is much more important than a two pixel error on this
[01:21:31] important than a two pixel error on this human because the box is smaller so
[01:21:33] human because the box is smaller so that's why you use a square root to
[01:21:35] that's why you use a square root to penalize more the errors on small boxes
[01:21:39] penalize more the errors on small boxes than on big boxes okay and finally the
[01:21:42] than on big boxes okay and finally the final slide okay let's go over so just
[01:21:46] final slide okay let's go over so just recalling what we have for next week you
[01:21:49] recalling what we have for next week you have two modules to complete for next
[01:21:51] have two modules to complete for next Wednesday which are c1 m3 with the
[01:21:54] Wednesday which are c1 m3 with the following quiz and the following
[01:21:56] following quiz and the following programming assignments c1 m4 with one
[01:21:58] programming assignments c1 m4 with one quiz and true programming assignments
[01:22:00] quiz and true programming assignments you're going to build your first deep
[01:22:02] you're going to build your first deep neural network this is all going to be
[01:22:04] neural network this is all going to be on the web it's already on the website
[01:22:05] on the web it's already on the website and we'll publish the slides now you
[01:22:08] and we'll publish the slides now you have ta project membership that is
[01:22:10] have ta project membership that is mandatory this week so ta project
[01:22:12] mandatory this week so ta project mentorships are mandatory this week to
[01:22:15] mentorships are mandatory this week to start the week before the project
[01:22:17] start the week before the project proposal the week before the project no
[01:22:19] proposal the week before the project no after the proposal after the project
[01:22:21] after the proposal after the project milestone and before the final project
[01:22:22] milestone and before the final project submission okay and try ATS sections
[01:22:26] submission okay and try ATS sections you're going to do some neural style
[01:22:28] you're going to do some neural style transfer and art generation fill in the
[01:22:31] transfer and art generation fill in the AWS form I don't know if it's been done
[01:22:32] AWS form I don't know if it's been done yet we're going to try to give you some
[01:22:35] yet we're going to try to give you some credits for your projects with GPUs ok
[01:22:39] credits for your projects with GPUs ok thanks guys


================================================================================
LECTURE 003
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 3 - Full-Cycle Deep Learning Projects

Source: https://www.youtube.com/watch?v=JUJNGv_sb4Y

---

Transcript

[00:00:05] all right Harry one okay I guess we're
[00:00:08] all right Harry one okay I guess we're live so as Aarthi was saying please
[00:00:12] live so as Aarthi was saying please enter your son at ID we can bring this
[00:00:15] enter your son at ID we can bring this up again at the end of class today we're
[00:00:18] up again at the end of class today we're just taking another like what 20 seconds
[00:00:20] just taking another like what 20 seconds and then we'll we'll go onto the main
[00:00:22] and then we'll we'll go onto the main discussion all right so um what I want
[00:00:37] discussion all right so um what I want to discuss with you today is maybe what
[00:00:41] to discuss with you today is maybe what I'm gonna call full cycle deep learning
[00:00:43] I'm gonna call full cycle deep learning applications and so I think this Sunday
[00:00:56] applications and so I think this Sunday you'll be submitting your proposals for
[00:00:59] you'll be submitting your proposals for the class projects you do this quarter
[00:01:01] the class projects you do this quarter and in most of the in a lot of what you
[00:01:06] and in most of the in a lot of what you learn about the machine learning
[00:01:07] learn about the machine learning projects you learn how to build machine
[00:01:09] projects you learn how to build machine learning models what I want to do today
[00:01:11] learning models what I want to do today is share view the bigger context of how
[00:01:15] is share view the bigger context of how a machine learning model you know have a
[00:01:17] a machine learning model you know have a neural network my train fits in the
[00:01:20] neural network my train fits in the context of a bigger project so what are
[00:01:23] context of a bigger project so what are all the steps right just as if you're
[00:01:25] all the steps right just as if you're writing a software product you know you
[00:01:27] writing a software product you know you take other classes and don't you know
[00:01:29] take other classes and don't you know that that teach you how to build a
[00:01:32] that that teach you how to build a website for example what is that
[00:01:36] website for example what is that but Sybilla product requires more than
[00:01:40] but Sybilla product requires more than just building a website right so what
[00:01:42] just building a website right so what are the what are the other things you
[00:01:43] are the what are the other things you need to do to actually do a successful
[00:01:44] need to do to actually do a successful software project in this case to do a
[00:01:46] software project in this case to do a successful machine learning application
[00:01:49] successful machine learning application and so let's see so - oh yeah oh
[00:01:58] and so let's see so - oh yeah oh test test is the only alarm test could
[00:02:01] test test is the only alarm test could you turn up the audio
[00:02:02] you turn up the audio No how's this nope can't hear me at all
[00:02:06] No how's this nope can't hear me at all my oh I think I'm broadcasting I hear
[00:02:09] my oh I think I'm broadcasting I hear myself great okay you can hear me now
[00:02:14] myself great okay you can hear me now great thank you all right thank you all
[00:02:17] great thank you all right thank you all right so one over this is share view a
[00:02:19] right so one over this is share view a full cycle machine learning not just how
[00:02:21] full cycle machine learning not just how to you learn a lot about how to build
[00:02:24] to you learn a lot about how to build deep learning models but how does that
[00:02:26] deep learning models but how does that fit in a bigger project right just as if
[00:02:28] fit in a bigger project right just as if you're taking the claws on building a
[00:02:30] you're taking the claws on building a website you know then great you know how
[00:02:31] website you know then great you know how the code of a website that's really
[00:02:33] the code of a website that's really valuable but what are all the things you
[00:02:34] valuable but what are all the things you need to do to make a successful website
[00:02:36] need to do to make a successful website to build a build a project that involves
[00:02:38] to build a build a project that involves launching a website up mobile back or
[00:02:40] launching a website up mobile back or whatever so as you plan for your class
[00:02:45] whatever so as you plan for your class project proposals do to Sunday if you're
[00:02:49] project proposals do to Sunday if you're doing an application project that fits
[00:02:51] doing an application project that fits in the context of a bigger application
[00:02:54] in the context of a bigger application also take some of these steps in mind
[00:02:56] also take some of these steps in mind right so you know these are what I think
[00:03:00] right so you know these are what I think of as a steps of an ml project oh really
[00:03:04] of as a steps of an ml project oh really maybe maybe not class project but maybe
[00:03:07] maybe maybe not class project but maybe you're serious machine learning
[00:03:10] you're serious machine learning application right and I think oh no I've
[00:03:13] application right and I think oh no I've built a lot of machine learning products
[00:03:14] built a lot of machine learning products over several years so some of these are
[00:03:17] over several years so some of these are also things that I wish I had known like
[00:03:19] also things that I wish I had known like you know many years ago um one this was
[00:03:25] you know many years ago um one this was kind of maybe kind of obvious but you
[00:03:28] kind of maybe kind of obvious but you know select a problem and let's say for
[00:03:31] know select a problem and let's say for the sake of simplicity that if you use
[00:03:35] the sake of simplicity that if you use supervised learning right it turns out
[00:03:38] supervised learning right it turns out for the CSU 30 class projects I think
[00:03:40] for the CSU 30 class projects I think more than 50% of the projects tend to
[00:03:43] more than 50% of the projects tend to use supervisor and then there are also
[00:03:44] use supervisor and then there are also other projects that use end up using
[00:03:46] other projects that use end up using gans which talked about later this
[00:03:48] gans which talked about later this quarter or all the things but I think
[00:03:49] quarter or all the things but I think you know let's say you supervised
[00:03:51] you know let's say you supervised learning to build a machine application
[00:03:53] learning to build a machine application and oh and I think for today I'm gonna
[00:03:57] and oh and I think for today I'm gonna use as a running example building a
[00:04:01] use as a running example building a building a voice-activated device right
[00:04:04] building a voice-activated device right so you know I don't know actually how
[00:04:07] so you know I don't know actually how many of you have like a smart speaker in
[00:04:09] many of you have like a smart speaker in your home like a voice-activated device
[00:04:11] your home like a voice-activated device in your home you know there were in the
[00:04:13] in your home you know there were in the u.s. well not that many of you it just
[00:04:15] u.s. well not that many of you it just okay cool yeah so I think you know the
[00:04:18] okay cool yeah so I think you know the the Amazon echoes google homes the apple
[00:04:20] the Amazon echoes google homes the apple series or the the in China might one of
[00:04:24] series or the the in China might one of my form of tea is built by two 200s but
[00:04:28] my form of tea is built by two 200s but let's say for the sake of argument that
[00:04:30] let's say for the sake of argument that you want to build a voice-activated
[00:04:32] you want to build a voice-activated device and I'm going to use as a running
[00:04:34] device and I'm going to use as a running example and so in order to build a
[00:04:38] example and so in order to build a voice-activated advice and again I'm not
[00:04:40] voice-activated advice and again I'm not gonna use any of the commercial brands
[00:04:42] gonna use any of the commercial brands like Alexa okay Google or hey Suri or I
[00:04:45] like Alexa okay Google or hey Suri or I guess in China was a hello sale dude
[00:04:47] guess in China was a hello sale dude won't sell do anyhow which means kind of
[00:04:48] won't sell do anyhow which means kind of roughly how long I don't do um but let's
[00:04:51] roughly how long I don't do um but let's use a more neutral word which is less
[00:04:53] use a more neutral word which is less you wanted to build a device that your
[00:04:55] you wanted to build a device that your response to that will activate and
[00:04:57] response to that will activate and you're actually gonna implement this as
[00:04:58] you're actually gonna implement this as a problem set later
[00:04:59] a problem set later this quarter but so you want to build a
[00:05:03] this quarter but so you want to build a yeah okay no volume publish let's see
[00:05:12] yeah okay no volume publish let's see how they okay is this better
[00:05:14] how they okay is this better no yes yeah this is better okay cool
[00:05:17] no yes yeah this is better okay cool thank you no but ironic don't have
[00:05:19] thank you no but ironic don't have speech recognition and the volume is
[00:05:21] speech recognition and the volume is higher okay um so let's say you want a
[00:05:24] higher okay um so let's say you want a WoW is it let me know if I can suffocate
[00:05:27] WoW is it let me know if I can suffocate it thank you um so let's you want to
[00:05:30] it thank you um so let's you want to build a voice-activated device so the
[00:05:32] build a voice-activated device so the key components the key machine learning
[00:05:34] key components the key machine learning deep learning component is going to be a
[00:05:37] deep learning component is going to be a learning algorithm that takes us input
[00:05:39] learning algorithm that takes us input and audio clip and outputs to the detect
[00:05:48] and audio clip and outputs to the detect what's sometimes called the trigger word
[00:05:53] yeah did I go soft again okay this'll be
[00:05:56] yeah did I go soft again okay this'll be great alright and and O+ why you know
[00:05:58] great alright and and O+ why you know zero one did you to check their trigger
[00:06:01] zero one did you to check their trigger word such as a lexer or okay google or
[00:06:03] word such as a lexer or okay google or history or a hollow little do or or
[00:06:07] history or a hollow little do or or activate on whatever wake where they'll
[00:06:09] activate on whatever wake where they'll trigger word right um and so
[00:06:16] step one is select a problem and then in
[00:06:22] step one is select a problem and then in order to train a learning algorithm you
[00:06:25] order to train a learning algorithm you need to get labeled data if you apply
[00:06:27] need to get labeled data if you apply supervised learning and then you design
[00:06:33] supervised learning and then you design a model use backdrop or some of the
[00:06:41] a model use backdrop or some of the other albums you learn about momentum
[00:06:42] other albums you learn about momentum atom various optimization algorithms
[00:06:45] atom various optimization algorithms gradient descent to train the model and
[00:06:50] gradient descent to train the model and then maybe you test it on your test set
[00:06:54] and then you deploy it meaning you start
[00:06:58] and then you deploy it meaning you start selling these smart speakers and you
[00:07:00] selling these smart speakers and you know putting them into hopefully until
[00:07:02] know putting them into hopefully until you uses homes and then you have to
[00:07:10] you uses homes and then you have to maintain the system I'll talk about this
[00:07:12] maintain the system I'll talk about this later as well and and this is not
[00:07:15] later as well and and this is not chronological but one thing that's often
[00:07:17] chronological but one thing that's often done but I want to talk about it at the
[00:07:19] done but I want to talk about it at the end instead it's not really step 8 is
[00:07:21] end instead it's not really step 8 is IQA which is a quality assurance which
[00:07:24] IQA which is a quality assurance which is an ongoing process right and so one
[00:07:29] is an ongoing process right and so one let's see
[00:07:31] let's see so as you so if you want to build a
[00:07:33] so as you so if you want to build a product if you want to sell a machine
[00:07:34] product if you want to sell a machine there any product these are maybe some
[00:07:36] there any product these are maybe some of the key steps you need to work on
[00:07:39] of the key steps you need to work on some observations when you train them
[00:07:41] some observations when you train them although training them all is often a
[00:07:43] although training them all is often a very iterative process so every time you
[00:07:45] very iterative process so every time you train the machine there in your model
[00:07:46] train the machine there in your model you find that you know I can almost
[00:07:49] you find that you know I can almost guarantee whatever you do it will not
[00:07:51] guarantee whatever you do it will not work at least not the first time right
[00:07:54] work at least not the first time right and so you find that even though I've
[00:07:56] and so you find that even though I've written is a sequence of steps when you
[00:07:58] written is a sequence of steps when you train them although you're on the go
[00:07:59] train them although you're on the go note that neural network architecture
[00:08:01] note that neural network architecture didn't work I need to increase the
[00:08:03] didn't work I need to increase the number of hidden units or change the
[00:08:04] number of hidden units or change the regularization or switch there are n N
[00:08:07] regularization or switch there are n N or switch to a totally different
[00:08:08] or switch to a totally different architecture and sometimes you train
[00:08:10] architecture and sometimes you train them all and go nope that didn't work I
[00:08:12] them all and go nope that didn't work I need to get more data right and so this
[00:08:16] need to get more data right and so this is often a very iterative process where
[00:08:18] is often a very iterative process where you're cycling through
[00:08:19] you're cycling through oh there's several different steps here
[00:08:23] oh there's several different steps here and then I think one distinction that
[00:08:26] and then I think one distinction that you have not yet learned about in the
[00:08:28] you have not yet learned about in the Coursera in the d-plan Dalia kosair
[00:08:30] Coursera in the d-plan Dalia kosair videos is how to split up the data into
[00:08:32] videos is how to split up the data into train dev and test so I'm going to
[00:08:34] train dev and test so I'm going to simplify those details for now but just
[00:08:37] simplify those details for now but just as a foreshadowing you guys know what oh
[00:08:40] as a foreshadowing you guys know what oh you learn later in the in the deep end
[00:08:44] you learn later in the in the deep end ie i conserve videos is how to take a
[00:08:46] ie i conserve videos is how to take a dataset you have training to use my
[00:08:48] dataset you have training to use my entire training set into a set that you
[00:08:52] entire training set into a set that you actually test cross validate using
[00:08:54] actually test cross validate using during development called a deficit or
[00:08:57] during development called a deficit or development set or hold our
[00:08:58] development set or hold our cross-validation set that's what's a
[00:08:59] cross-validation set that's what's a separate test set so you learn about
[00:09:01] separate test set so you learn about this later but I'm just simplifying a
[00:09:02] this later but I'm just simplifying a little bit for today okay so um so I
[00:09:10] little bit for today okay so um so I think the first thing I want to do is
[00:09:14] think the first thing I want to do is ask you a question right so we're gonna
[00:09:17] ask you a question right so we're gonna talk through many of these steps so and
[00:09:19] talk through many of these steps so and it turns out that what a lot of machine
[00:09:21] it turns out that what a lot of machine learning classes do and do a good job
[00:09:23] learning classes do and do a good job teaching is focusing on maybe these
[00:09:26] teaching is focusing on maybe these three steps or maybe these four steps
[00:09:29] three steps or maybe these four steps right and what I want to do today is
[00:09:31] right and what I want to do today is spend more time so this is the heart of
[00:09:34] spend more time so this is the heart of machine there I mean how do you build a
[00:09:35] machine there I mean how do you build a green model and what I want to do today
[00:09:38] green model and what I want to do today is spend more time talking about step
[00:09:40] is spend more time talking about step one and six and seven and then just a
[00:09:43] one and six and seven and then just a little bit of time talking about that
[00:09:44] little bit of time talking about that call this because you kind of need to do
[00:09:46] call this because you kind of need to do the other steps was well if you wanna
[00:09:47] the other steps was well if you wanna throw the deep learning product or build
[00:09:48] throw the deep learning product or build a machine learning application okay um
[00:09:51] a machine learning application okay um so let's sort of a discussion question
[00:09:53] so let's sort of a discussion question um I'm actually curious if you are
[00:09:58] um I'm actually curious if you are selecting a project to work on
[00:10:05] selecting a project to work on what other actually so I don't don't
[00:10:08] what other actually so I don't don't answer this yet I'll tell you what the
[00:10:09] answer this yet I'll tell you what the question I'm going to ask is which is
[00:10:15] alright uh what properties make for a
[00:10:18] alright uh what properties make for a good candidate deep learning project but
[00:10:20] good candidate deep learning project but don't answer yet right now I want to say
[00:10:22] don't answer yet right now I want to say a few more things before before I invite
[00:10:24] a few more things before before I invite you to answer which is that all of you
[00:10:26] you to answer which is that all of you for the last few days I hope I've been
[00:10:28] for the last few days I hope I've been thinking about what price you want to do
[00:10:29] thinking about what price you want to do for this cause and what I want to do is
[00:10:31] for this cause and what I want to do is just discuss some properties of what a
[00:10:33] just discuss some properties of what a good project to work on and what are
[00:10:35] good project to work on and what are maybe not good practice and where
[00:10:37] maybe not good practice and where okay and and think of this as your
[00:10:39] okay and and think of this as your chance to give your classmates advice
[00:10:40] chance to give your classmates advice right one of the things your cosplay
[00:10:42] right one of the things your cosplay should think about the challenge decides
[00:10:43] should think about the challenge decides is a good price to work on okay um and
[00:10:46] is a good price to work on okay um and so what I want to do for today is use
[00:10:50] so what I want to do for today is use this voice-activated thing as as long as
[00:10:53] this voice-activated thing as as long as it most an example and you know there's
[00:10:56] it most an example and you know there's actually one project I was working on
[00:11:00] actually one project I was working on actually a father where actually there's
[00:11:02] actually a father where actually there's one project I thought of working on but
[00:11:03] one project I thought of working on but decided not to work on and that there's
[00:11:07] decided not to work on and that there's a voice-activated device so it turns out
[00:11:09] a voice-activated device so it turns out that um these voice-activated devices
[00:11:12] that um these voice-activated devices and echo Google homes and so on they are
[00:11:14] and echo Google homes and so on they are taking off quite rapidly in the US and
[00:11:16] taking off quite rapidly in the US and around the world um it turns out that
[00:11:18] around the world um it turns out that one of the you know significant pain
[00:11:20] one of the you know significant pain points of these devices is the need to
[00:11:23] points of these devices is the need to configure it right to set it up for
[00:11:25] configure it right to set it up for Wi-Fi so I've done a lot of work on
[00:11:28] Wi-Fi so I've done a lot of work on speech recognition you know a hotel de
[00:11:30] speech recognition you know a hotel de la working on speech system
[00:11:32] la working on speech system I let the by to speech system so I've
[00:11:34] I let the by to speech system so I've been published papers on speech
[00:11:35] been published papers on speech recognition and I have a I have one of
[00:11:37] recognition and I have a I have one of these devices in my home right actually
[00:11:40] these devices in my home right actually I have an Amazon echo in my innovative
[00:11:42] I have an Amazon echo in my innovative but even to this day I have configured
[00:11:45] but even to this day I have configured exactly one light bulb to be hooked up
[00:11:48] exactly one light bulb to be hooked up to be controlled by my echo because the
[00:11:50] to be controlled by my echo because the the set up process not blaming any
[00:11:52] the set up process not blaming any country is just difficult to hook up you
[00:11:55] country is just difficult to hook up you know a Wi-Fi enabled light bomb and then
[00:11:58] know a Wi-Fi enabled light bomb and then to set it up so that your small speaker
[00:12:01] to set it up so that your small speaker or whatever as in say you know smart
[00:12:03] or whatever as in say you know smart device turn off the lamp so I have one
[00:12:05] device turn off the lamp so I have one light bulb in my living room right that
[00:12:07] light bulb in my living room right that I can turn on and off and that's it
[00:12:09] I can turn on and off and that's it right even as a speech researcher so
[00:12:14] um we must have valleys um so one one
[00:12:19] um we must have valleys um so one one one application that I think that
[00:12:21] one application that I think that actually sir sequencing working on is to
[00:12:24] actually sir sequencing working on is to build a embedded device that you can
[00:12:27] build a embedded device that you can sell to lamp makers so that I don't know
[00:12:30] sell to lamp makers so that I don't know where you buy an ounce from and you know
[00:12:32] where you buy an ounce from and you know have a few Lancer Mike here if you lands
[00:12:34] have a few Lancer Mike here if you lands or wherever but you can buy a desk lamp
[00:12:36] or wherever but you can buy a desk lamp so that when you buy the desk lamp
[00:12:39] so that when you buy the desk lamp there's already a built-in microphone so
[00:12:41] there's already a built-in microphone so that without needing to connect this
[00:12:43] that without needing to connect this thing to Wi-Fi you know as I hey here's
[00:12:45] thing to Wi-Fi you know as I hey here's a twenty dollar desk lamp put them on
[00:12:48] a twenty dollar desk lamp put them on your desk and you can go home and say
[00:12:50] your desk and you can go home and say desk lamp turn on or just lamp turn off
[00:12:53] desk lamp turn on or just lamp turn off then I think that will help a lot more
[00:12:56] then I think that will help a lot more users get voice activated devices into
[00:12:59] users get voice activated devices into their home and it's actually not clear
[00:13:00] their home and it's actually not clear to me if you want to turn on a desk lamp
[00:13:02] to me if you want to turn on a desk lamp is actually not clear to me that you
[00:13:04] is actually not clear to me that you want to turn to small speaker and say
[00:13:06] want to turn to small speaker and say hey smart speaker please turn on that
[00:13:08] hey smart speaker please turn on that lamp over there it may be a fuse one
[00:13:11] lamp over there it may be a fuse one actor they just talk directly to a desk
[00:13:13] actor they just talk directly to a desk lamp and tow it to turn on the turn and
[00:13:16] lamp and tow it to turn on the turn and so so far well is where if someone
[00:13:19] so so far well is where if someone friends and I we evaluated this we
[00:13:20] friends and I we evaluated this we actually thought that this could be a
[00:13:21] actually thought that this could be a reasonable business to build embedded
[00:13:24] reasonable business to build embedded devices to sell to lamp makers or other
[00:13:27] devices to sell to lamp makers or other device makers so that they can sell
[00:13:28] device makers so that they can sell their own voice-activated devices
[00:13:30] their own voice-activated devices without needing just complicated Wi-Fi
[00:13:32] without needing just complicated Wi-Fi setup process and so to do this you
[00:13:35] setup process and so to do this you would need to build a learning algorithm
[00:13:36] would need to build a learning algorithm and have it Raman invent a device that
[00:13:39] and have it Raman invent a device that just inputs an audio clip and outputs
[00:13:41] just inputs an audio clip and outputs you know whenever it detects the wave
[00:13:44] you know whenever it detects the wave word and instead of a wake where being
[00:13:45] word and instead of a wake where being activated the week where it would be a
[00:13:47] activated the week where it would be a lamp turned on or lamp turn off you need
[00:13:50] lamp turned on or lamp turn off you need to wake where so trigger words want to
[00:13:52] to wake where so trigger words want to turn it on when to turn it off right oh
[00:13:54] turn it on when to turn it off right oh and and and I think just the other thing
[00:13:56] and and and I think just the other thing that I think would make this work is to
[00:14:01] that I think would make this work is to to give these devices names so if you
[00:14:04] to give these devices names so if you have five lamps or two lambs you you
[00:14:06] have five lamps or two lambs you you need an way to index into these
[00:14:07] need an way to index into these different desk lamps so let's say you
[00:14:10] different desk lamps so let's say you decide for your project you know to have
[00:14:12] decide for your project you know to have a little switch here so this lamp could
[00:14:14] a little switch here so this lamp could be called John or Mary or Bob or Alice
[00:14:19] be called John or Mary or Bob or Alice like a four-way switch so that depending
[00:14:21] like a four-way switch so that depending on where you set this four-way switch
[00:14:22] on where you set this four-way switch you can say you know John
[00:14:25] you can say you know John turn oh right always if you decide to
[00:14:28] turn oh right always if you decide to call this lamb John I girls would give
[00:14:30] call this lamb John I girls would give us some of the names so you don't have
[00:14:31] us some of the names so you don't have every lamp by the same name okay um so
[00:14:34] every lamp by the same name okay um so what I'm gonna do is use as a motivating
[00:14:38] what I'm gonna do is use as a motivating example this as a possible project oh
[00:14:41] example this as a possible project oh and I'm not working on this if any of
[00:14:43] and I'm not working on this if any of you want to be able to start up doing
[00:14:44] you want to be able to start up doing this go for it this is not know I felt
[00:14:48] this go for it this is not know I felt my team's night way better ideas so we
[00:14:50] my team's night way better ideas so we want to do other things in this by I
[00:14:51] want to do other things in this by I should don't see anything wrong with
[00:14:52] should don't see anything wrong with this I think there's actually could be a
[00:14:53] this I think there's actually could be a reasonable thing to pursue yourself and
[00:14:55] reasonable thing to pursue yourself and I'm not doing it so yeah very welcome to
[00:14:57] I'm not doing it so yeah very welcome to if you want okay so now the question
[00:15:00] if you want okay so now the question that one opposed to you is when you're
[00:15:03] that one opposed to you is when you're brainstorming project ideas you know
[00:15:04] brainstorming project ideas you know like this idea some other idea um what
[00:15:08] like this idea some other idea um what are the things you would want to watch
[00:15:09] are the things you would want to watch out for well what are the properties
[00:15:11] out for well what are the properties that you have want to be true in order
[00:15:13] that you have want to be true in order for you to few good proposing this as a
[00:15:15] for you to few good proposing this as a as a CSU 30 project right so why should
[00:15:18] as a CSU 30 project right so why should take a minute and write this down I
[00:15:20] take a minute and write this down I think uh yeah what if you're asking your
[00:15:24] think uh yeah what if you're asking your friend if a friend is asking you what
[00:15:26] friend if a friend is asking you what are the things I should look at to see
[00:15:28] are the things I should look at to see if something is a big project
[00:15:29] if something is a big project what would you why you recommend to them
[00:15:32] what would you why you recommend to them so feel free just write down a few key
[00:15:33] so feel free just write down a few key words and then we'll see what people say
[00:15:35] words and then we'll see what people say and then and then I'll tell you what I
[00:15:38] and then and then I'll tell you what I tend to look out for when I'm selecting
[00:15:40] tend to look out for when I'm selecting projects and I have a list of five
[00:15:43] projects and I have a list of five points my stick like I don't know like
[00:15:53] points my stick like I don't know like two minutes - oh sorry this is not
[00:15:57] two minutes - oh sorry this is not activated you're not able to answer is
[00:16:03] activated you're not able to answer is up enter SS okay
[00:16:13] just checking on yeah BAM connect to the
[00:16:15] just checking on yeah BAM connect to the internet RT any ideas oh I see okay
[00:16:22] internet RT any ideas oh I see okay all right let me try that it's what's
[00:16:29] all right let me try that it's what's working now okay thank you yes thank you
[00:17:10] so you take like two minutes to enter
[00:17:13] so you take like two minutes to enter and I think I think I can figure this
[00:17:15] and I think I think I can figure this and let you enter multiple answers
[00:17:16] and let you enter multiple answers mistake 2 News
[00:18:07] all right another one minute 30 seconds
[00:19:04] okay three two one
[00:19:10] well maybe in hindsight that wasn't the
[00:19:13] well maybe in hindsight that wasn't the best visualization can people see this
[00:19:38] well than one trying to see if all right
[00:19:41] well than one trying to see if all right so detail in novels he lost his data
[00:19:44] so detail in novels he lost his data some of these re small human doable
[00:19:46] some of these re small human doable number of examples during two months no
[00:19:49] number of examples during two months no office you industrial fields clear
[00:19:53] office you industrial fields clear objective practical useful oh ok finish
[00:19:59] objective practical useful oh ok finish in time he'll stroll life problem useful
[00:20:06] in time he'll stroll life problem useful hasn't been done computationally
[00:20:08] hasn't been done computationally tractable yeah
[00:20:10] tractable yeah generalization see cool great oh let me
[00:20:16] generalization see cool great oh let me make some comments on fees I think I
[00:20:18] make some comments on fees I think I this is this is pretty good um I had a
[00:20:21] this is this is pretty good um I had a list of five bullet points that maybe I
[00:20:23] list of five bullet points that maybe I just share view my list of five which is
[00:20:27] just share view my list of five which is I mean just some things I encourage you
[00:20:28] I mean just some things I encourage you to pay attention to well you know just
[00:20:32] to pay attention to well you know just this may or may not be the best criteria
[00:20:34] this may or may not be the best criteria but interests I think interest just it
[00:20:37] but interests I think interest just it doesn't hopefully you'd work on
[00:20:38] doesn't hopefully you'd work on something that you actually interested
[00:20:39] something that you actually interested in um and then I think right data
[00:20:45] in um and then I think right data availability which many of you cited is
[00:20:48] availability which many of you cited is a good criteria or one of the ways that
[00:20:50] a good criteria or one of the ways that Stanford class projects sometimes do not
[00:20:53] Stanford class projects sometimes do not go well is if students spend a month to
[00:20:55] go well is if students spend a month to try to collect data and after month I've
[00:20:57] try to collect data and after month I've not yet found it in your car again and
[00:20:59] not yet found it in your car again and then you know and then it's and there's
[00:21:02] then you know and then it's and there's a lot of waste of time um one thing that
[00:21:06] a lot of waste of time um one thing that I would encourage you to consider as
[00:21:13] I would encourage you to consider as well is domain knowledge um and I think
[00:21:17] well is domain knowledge um and I think that if you are a biologist and have
[00:21:20] that if you are a biologist and have unique knowledge into some aspect of
[00:21:22] unique knowledge into some aspect of biology to which you want to apply
[00:21:23] biology to which you want to apply machine learning that will actually let
[00:21:25] machine learning that will actually let you do is very interesting project right
[00:21:27] you do is very interesting project right that is actually difficult for others to
[00:21:30] that is actually difficult for others to do um and I think more generally as
[00:21:34] do um and I think more generally as advice for navigating your careers right
[00:21:36] advice for navigating your careers right so yeah this is a gene because AI
[00:21:38] so yeah this is a gene because AI machine learning deep learning there's
[00:21:39] machine learning deep learning there's so much there's so many people wanting
[00:21:41] so much there's so many people wanting to jump into machine learning and deep
[00:21:43] to jump into machine learning and deep learning actually giving example so I
[00:21:46] learning actually giving example so I sometimes talk to doctors near radiology
[00:21:48] sometimes talk to doctors near radiology students including
[00:21:51] students including Stanford and other universities
[00:21:52] Stanford and other universities recognizes students that want to learn
[00:21:55] recognizes students that want to learn about machine learning right because
[00:21:56] about machine learning right because they hear about you know deep learning
[00:21:58] they hear about you know deep learning maybe someday affecting radiolysis jobs
[00:22:00] maybe someday affecting radiolysis jobs and so they want to be part of deep
[00:22:01] and so they want to be part of deep learning and so my career advice to them
[00:22:04] learning and so my career advice to them is usually to not forget everything they
[00:22:08] is usually to not forget everything they learned as a doctor and try to you know
[00:22:10] learned as a doctor and try to you know do machine learning 101 from scratch and
[00:22:12] do machine learning 101 from scratch and just forget everything they learn as a
[00:22:14] just forget everything they learn as a doctor and just become a CS major I
[00:22:16] doctor and just become a CS major I think that that path can work but I
[00:22:18] think that that path can work but I think where radiologists could do the
[00:22:20] think where radiologists could do the most unique work that that allows them
[00:22:23] most unique work that that allows them to make the most unique contribution is
[00:22:25] to make the most unique contribution is that they use their domain knowledge of
[00:22:26] that they use their domain knowledge of healthcare radiology and do something in
[00:22:29] healthcare radiology and do something in machine learning applied to radiology
[00:22:30] machine learning applied to radiology right and so all right how many
[00:22:38] right and so all right how many Millennials are there in this class what
[00:22:42] Millennials are there in this class what does that mean me me anything all right
[00:22:50] this is really wrong yeah I I think it's
[00:22:55] this is really wrong yeah I I think it's because a word cloud so everybody counts
[00:22:56] because a word cloud so everybody counts where frequency right the money thing I
[00:22:59] where frequency right the money thing I don't know have very mixed feelings
[00:23:00] don't know have very mixed feelings about that
[00:23:04] all right um but I think actually fir I
[00:23:07] all right um but I think actually fir I actually know that some of you are
[00:23:08] actually know that some of you are taking you know deep learning because
[00:23:10] taking you know deep learning because you work on a different discipline and
[00:23:11] you work on a different discipline and you want to do something and this hot
[00:23:14] you want to do something and this hot new exciting thing of machine learning
[00:23:15] new exciting thing of machine learning and I think whatever discipline you're
[00:23:17] and I think whatever discipline you're in if you had told me knowledge about
[00:23:19] in if you had told me knowledge about some other area you know Education civil
[00:23:21] some other area you know Education civil engineering biology and law taking deep
[00:23:24] engineering biology and law taking deep learning allows you to do very unique
[00:23:26] learning allows you to do very unique work apply machine learning to your
[00:23:27] work apply machine learning to your domain right let's see um I think that I
[00:23:38] domain right let's see um I think that I think well I call the utility but
[00:23:40] think well I call the utility but several of you mentioned as well
[00:23:41] several of you mentioned as well something that has a positive impact
[00:23:42] something that has a positive impact that she helps other people and I don't
[00:23:47] that she helps other people and I don't know money could be an aspect of utility
[00:23:49] know money could be an aspect of utility but maybe not the most inspiring one and
[00:23:52] but maybe not the most inspiring one and then I think um I think one of the
[00:23:58] then I think um I think one of the biggest challenges we face in the
[00:24:00] biggest challenges we face in the industry today is still frankly is
[00:24:02] industry today is still frankly is actually good judgment on feasibility um
[00:24:04] actually good judgment on feasibility um so today I still see too many leaders
[00:24:09] so today I still see too many leaders sometimes CEOs of large companies that
[00:24:11] sometimes CEOs of large companies that stand onstage and announce to the whole
[00:24:14] stand onstage and announce to the whole world you know we're gonna do this
[00:24:15] world you know we're gonna do this machine learning project to do this by
[00:24:17] machine learning project to do this by this deadline and then 20 minutes later
[00:24:20] this deadline and then 20 minutes later I talked to their engineers and the NSA
[00:24:22] I talked to their engineers and the NSA no there's no way not happening what the
[00:24:26] no there's no way not happening what the CEO just final stage how engine
[00:24:27] CEO just final stage how engine motivation is not doing it and knows
[00:24:29] motivation is not doing it and knows it's impossible so I think one of the
[00:24:30] it's impossible so I think one of the biggest challenges is actually
[00:24:32] biggest challenges is actually feasibility um in fact I actually know
[00:24:34] feasibility um in fact I actually know that a chapter of RT about the TA office
[00:24:39] that a chapter of RT about the TA office hours and I know that there been a lot
[00:24:42] hours and I know that there been a lot of you know long if you have been
[00:24:43] of you know long if you have been thinking about applying end-to-end deep
[00:24:45] thinking about applying end-to-end deep learning right you know can you input
[00:24:48] learning right you know can you input any X and output any Y and do that
[00:24:49] any X and output any Y and do that accurately and sometimes it's possible
[00:24:51] accurately and sometimes it's possible and sometimes it's not and it still
[00:24:53] and sometimes it's not and it still takes relatively deep judgement about
[00:24:56] takes relatively deep judgement about what neural networks can and cannot do
[00:24:57] what neural networks can and cannot do with a certain amount of data that you
[00:25:00] with a certain amount of data that you may or may not be able to acquire in
[00:25:02] may or may not be able to acquire in order to do some of these things right
[00:25:04] order to do some of these things right so so I think throughout this quarter
[00:25:07] so so I think throughout this quarter you gain much deeper judgment as well on
[00:25:10] you gain much deeper judgment as well on what is feasible and I guess no Swedish
[00:25:15] what is feasible and I guess no Swedish thing I once no III knew a CEO of a very
[00:25:18] thing I once no III knew a CEO of a very of a large company that once told his
[00:25:21] of a large company that once told his team he actually gave his team these
[00:25:24] team he actually gave his team these instructions he said I watched assume
[00:25:26] instructions he said I watched assume that a I can do anything and and and I
[00:25:30] that a I can do anything and and and I think that had an interesting effect I
[00:25:33] think that had an interesting effect I guess yeah cool all right so I think
[00:25:38] guess yeah cool all right so I think step one um was select a project I hope
[00:25:42] step one um was select a project I hope there's this thing project I keep some
[00:25:43] there's this thing project I keep some of those things in mind um step two is
[00:25:46] of those things in mind um step two is get data and so uh what I want you to do
[00:25:55] get data and so uh what I want you to do I'm going to pose a second question and
[00:25:58] I'm going to pose a second question and then have some of you discuss this let's
[00:26:00] then have some of you discuss this let's say that you're actually working on this
[00:26:01] say that you're actually working on this you know smart voice-activated embedded
[00:26:06] you know smart voice-activated embedded device thing right so let's say that you
[00:26:07] device thing right so let's say that you and your friends wonderful startup so
[00:26:09] and your friends wonderful startup so train the deep learning algorithm to
[00:26:11] train the deep learning algorithm to detect you know phrases like John turn
[00:26:14] detect you know phrases like John turn on Mary turn off or Bob turn off or
[00:26:16] on Mary turn off or Bob turn off or whatever to sell to device makers so
[00:26:19] whatever to sell to device makers so that they can have low voice embedded
[00:26:21] that they can have low voice embedded voice detection tripped it doesn't
[00:26:22] voice detection tripped it doesn't require a complicated Wi-Fi setup
[00:26:25] require a complicated Wi-Fi setup process right so let's see one that
[00:26:26] process right so let's see one that let's say that you want to do this so
[00:26:28] let's say that you want to do this so you need to collect some data in order
[00:26:30] you need to collect some data in order to start training a learning algorithm
[00:26:32] to start training a learning algorithm okay so the second question I posed to
[00:26:35] okay so the second question I posed to you is to a question in two parts but
[00:26:40] you is to a question in two parts but but have you answer it all at the same
[00:26:42] but have you answer it all at the same time which is um in how many how many
[00:26:44] time which is um in how many how many days let's say you actually proposed
[00:26:47] days let's say you actually proposed this for your C su-30 project this
[00:26:49] this for your C su-30 project this Sunday and then you start work on it you
[00:26:51] Sunday and then you start work on it you know like on Monday or even it's not
[00:26:52] know like on Monday or even it's not work on it today before the proposal but
[00:26:55] work on it today before the proposal but how many days would you spend collecting
[00:26:57] how many days would you spend collecting data and how would you collect the data
[00:27:00] data and how would you collect the data okay and I think um actually how many of
[00:27:03] okay and I think um actually how many of you have participated in engineering
[00:27:05] you have participated in engineering scrum if you know what that means
[00:27:07] scrum if you know what that means oh okay a few you they'll see the
[00:27:09] oh okay a few you they'll see the industry okay alright so engineering
[00:27:11] industry okay alright so engineering estimation when you estimate how long a
[00:27:13] estimation when you estimate how long a project takes one of the common
[00:27:15] project takes one of the common practices is use a Fibonacci sequence to
[00:27:17] practices is use a Fibonacci sequence to estimate how long a project will take
[00:27:19] estimate how long a project will take right and so Fibonacci sequence 1 1 2
[00:27:23] right and so Fibonacci sequence 1 1 2 3 5 8 13 and so on and there's roughly
[00:27:28] 3 5 8 13 and so on and there's roughly powers of 2 but doesn't grow as fast as
[00:27:30] powers of 2 but doesn't grow as fast as powers 2 and Fibonacci numbers are cool
[00:27:32] powers 2 and Fibonacci numbers are cool right but so so so what I want you to do
[00:27:36] right but so so so what I want you to do a universal special have a configuration
[00:27:38] a universal special have a configuration right when I with speech bubbles okay
[00:27:53] right when I with speech bubbles okay yeah that's good
[00:27:55] yeah that's good all right so what I'd like you to do is
[00:27:58] all right so what I'd like you to do is in the text answer I really write two
[00:27:59] in the text answer I really write two things one is write a number how many
[00:28:02] things one is write a number how many days do you think you spend on
[00:28:03] days do you think you spend on collecting data you and your teammates
[00:28:05] collecting data you and your teammates if you're actually doing this project
[00:28:06] if you're actually doing this project and then how how would you go about
[00:28:08] and then how how would you go about collecting the data okay so much take
[00:28:12] collecting the data okay so much take like another two minutes to write in an
[00:28:18] like another two minutes to write in an answer oh I'm sorry
[00:28:28] they're still not activated
[00:28:40] sir all that I'm trying to hit oh you
[00:28:43] sir all that I'm trying to hit oh you just think that it's not helpful
[00:28:53] all right
[00:28:55] all right damn it's definitely not helpful all
[00:29:04] damn it's definitely not helpful all right let's do this
[00:29:05] right let's do this write down your answer on a piece of
[00:29:08] write down your answer on a piece of paper first and take two in this design
[00:29:10] paper first and take two in this design so the two questions are how many days
[00:29:13] so the two questions are how many days pick a number from a Fibonacci sequence
[00:29:15] pick a number from a Fibonacci sequence and are you oh okay yeah let's swap out
[00:29:20] and are you oh okay yeah let's swap out my computer varieties Oh actually yeah
[00:29:22] my computer varieties Oh actually yeah oh
[00:29:23] oh if hotties computers working I should go
[00:29:25] if hotties computers working I should go ahead oh yeah I can just present yeah
[00:29:30] ahead oh yeah I can just present yeah yeah let's plug in your laptop so you
[00:29:31] yeah let's plug in your laptop so you just use your laptop yeah doesn't say
[00:29:43] just use your laptop yeah doesn't say when there was a network problem or web
[00:29:44] when there was a network problem or web browser problem I started using a
[00:29:48] browser problem I started using a Firefox recently in addition to Chrome
[00:29:50] Firefox recently in addition to Chrome and Safari and Dallas Firefox I tried
[00:29:53] and Safari and Dallas Firefox I tried with other web browsers later I cook
[00:30:01] with other web browsers later I cook bacon thank you
[00:30:02] bacon thank you thanks Artie all right can maybe we have
[00:30:10] thanks Artie all right can maybe we have made people that take another minute
[00:30:12] made people that take another minute from now just extend the time bit turned
[00:31:04] alright another 10 seconds let's see my
[00:31:19] alright another 10 seconds let's see my sugar pills answers okay alright well
[00:31:25] sugar pills answers okay alright well 365 so there's a there's a there's a lot
[00:31:34] 365 so there's a there's a there's a lot of variance in the answers right I don't
[00:31:43] of variance in the answers right I don't know
[00:31:43] know download from online depends on what
[00:31:45] download from online depends on what data you want it turns out well so if
[00:31:47] data you want it turns out well so if you're trying to find data or phrases
[00:31:48] you're trying to find data or phrases like John turn on then that data doesn't
[00:31:51] like John turn on then that data doesn't exist online it turns out we're trying
[00:31:53] exist online it turns out we're trying to find audio clips of the web activate
[00:31:56] to find audio clips of the web activate there are some websites with single
[00:31:58] there are some websites with single words pronounced but those but not a lot
[00:32:01] words pronounced but those but not a lot of audio clips actually so they trade
[00:32:02] of audio clips actually so they trade the world of the wake word is the word
[00:32:03] the world of the wake word is the word activate there are some websites we can
[00:32:06] activate there are some websites we can download like maybe 10 audio clips of a
[00:32:09] download like maybe 10 audio clips of a few people are saying activate but it's
[00:32:11] few people are saying activate but it's quite hard to find hundreds of examples
[00:32:13] quite hard to find hundreds of examples of different people saying they were
[00:32:14] of different people saying they were activate
[00:32:24] five days it falls in the sky all right
[00:32:33] five days it falls in the sky all right so let me suggest let me suggest that
[00:32:38] so let me suggest let me suggest that you guys discuss with each other in
[00:32:40] you guys discuss with each other in small groups what you think would be the
[00:32:43] small groups what you think would be the best strategy how many days we find
[00:32:44] best strategy how many days we find collecting the data and how would you
[00:32:46] collecting the data and how would you start going and try to convince people
[00:32:47] start going and try to convince people next to you on that and and before I ask
[00:32:51] next to you on that and and before I ask you to start discussing I want to leave
[00:32:53] you to start discussing I want to leave you one thought which is how long do you
[00:32:55] you one thought which is how long do you think I'll take you to train your first
[00:32:57] think I'll take you to train your first model right and so if I take you a day
[00:33:00] model right and so if I take you a day to train your first model or two days do
[00:33:03] to train your first model or two days do you want to spend X time collecting the
[00:33:06] you want to spend X time collecting the eternal and then spend let's say you
[00:33:08] eternal and then spend let's say you know I don't know just out of a deep
[00:33:09] know I don't know just out of a deep learning thing train them although it
[00:33:11] learning thing train them although it might take a couple of days very
[00:33:12] might take a couple of days very especially if you download open source
[00:33:13] especially if you download open source packages so so if the amount of time
[00:33:15] packages so so if the amount of time needed to collect data is X followed by
[00:33:18] needed to collect data is X followed by two days to train your first model what
[00:33:20] two days to train your first model what do you think X should be the amount of
[00:33:22] do you think X should be the amount of time once you guys spend like two
[00:33:23] time once you guys spend like two minutes to discuss with each other and
[00:33:25] minutes to discuss with each other and see if you can compare their answers are
[00:33:27] see if you can compare their answers are there's very large variance right once
[00:33:29] there's very large variance right once you guys discuss if you actually if the
[00:33:31] you guys discuss if you actually if the people sitting next to you are your
[00:33:32] people sitting next to you are your project partners why should discuss with
[00:33:34] project partners why should discuss with them how many days you think you should
[00:33:35] them how many days you think you should spend collecting danger and how you
[00:33:37] spend collecting danger and how you collect your data okay let's take two
[00:33:39] collect your data okay let's take two minutes to discuss them
[00:35:04] [Applause]
[00:35:52] all right guys so Wow all right guy hey
[00:36:01] all right guys so Wow all right guy hey guys so all right lot of exciting
[00:36:06] guys so all right lot of exciting discussion so actually how many of you
[00:36:11] discussion so actually how many of you how many of the groups wound up on the
[00:36:13] how many of the groups wound up on the on the low end how many of you you know
[00:36:15] on the low end how many of you you know convince each other that maybe it should
[00:36:18] convince each other that maybe it should be like three days or less oh just a few
[00:36:22] be like three days or less oh just a few of you how come so someone someone say
[00:36:26] of you how come so someone someone say why why said wife oh yeah got them open
[00:36:43] why why said wife oh yeah got them open the data to test how the other works
[00:36:44] the data to test how the other works before you were going to make a massive
[00:36:46] before you were going to make a massive you said and then anyone has had a high
[00:36:50] you said and then anyone has had a high end like a 13 days or more very few how
[00:36:54] end like a 13 days or more very few how come anyone actually anyone anyone with
[00:36:56] come anyone actually anyone anyone with insights you want a share of the whole
[00:36:57] insights you want a share of the whole classes you
[00:36:59] classes you what were you are discussing so excited
[00:37:11] knowledge that I actually can take a
[00:37:14] knowledge that I actually can take a long time especially for this colony
[00:37:15] long time especially for this colony like based on the one I gave you this
[00:37:17] like based on the one I gave you this doesn't be a statue if they can make
[00:37:19] doesn't be a statue if they can make pretty you say movieclip something like
[00:37:21] pretty you say movieclip something like subtitles to make generate sound like
[00:37:24] subtitles to make generate sound like audience it means as far as we were just
[00:37:26] audience it means as far as we were just wanted to get our system live and that
[00:37:29] wanted to get our system live and that would take like a time to like mine yeah
[00:37:39] would take like a time to like mine yeah yeah yeah right yeah and then company
[00:37:41] yeah yeah right yeah and then company systems to look at subtitle videos right
[00:37:44] systems to look at subtitle videos right like YouTube videos with captions or
[00:37:47] like YouTube videos with captions or something and if there's appropriately
[00:37:49] something and if there's appropriately Creative Commons data there that you
[00:37:50] Creative Commons data there that you could use
[00:37:51] could use yeah so let me let me tell you my bias I
[00:37:54] yeah so let me let me tell you my bias I just tell you what I would do if I was
[00:37:56] just tell you what I would do if I was working on this project
[00:37:57] working on this project well one caveat if I haven't done so
[00:38:00] well one caveat if I haven't done so much work and speech recognition
[00:38:01] much work and speech recognition previously fragments was my first
[00:38:02] previously fragments was my first project um I would probably spend one to
[00:38:06] project um I would probably spend one to two days collecting today's Hera kind of
[00:38:08] two days collecting today's Hera kind of on the short end right and I think that
[00:38:11] on the short end right and I think that Dallman you know one of the and and and
[00:38:14] Dallman you know one of the and and and one of the reasons is that machine
[00:38:17] one of the reasons is that machine learning kind of that circle I grew up
[00:38:18] learning kind of that circle I grew up there is actually a very iterative
[00:38:21] there is actually a very iterative process where um until you try it you
[00:38:25] process where um until you try it you you almost never know what's actually
[00:38:27] you almost never know what's actually going to be hard about the problem right
[00:38:30] going to be hard about the problem right and so so if I was seeing this project I
[00:38:33] and so so if I was seeing this project I just so you wanna see what I would do
[00:38:34] just so you wanna see what I would do okay now that you thought about this
[00:38:35] okay now that you thought about this project a bunch right including you know
[00:38:37] project a bunch right including you know trying to validate market acceptance and
[00:38:39] trying to validate market acceptance and so on but um which is that I would get a
[00:38:42] so on but um which is that I would get a cheap microphone user or use the
[00:38:45] cheap microphone user or use the built-in laptop microphone or buy a
[00:38:46] built-in laptop microphone or buy a microphone off you know buy a microphone
[00:38:48] microphone off you know buy a microphone off Amazon or something and go around
[00:38:51] off Amazon or something and go around say go around Stanford campus or go to
[00:38:53] say go around Stanford campus or go to your friends and have them just say hey
[00:38:55] your friends and have them just say hey do you mind saying into this microphone
[00:38:57] do you mind saying into this microphone the word activate or don't turn on or
[00:38:59] the word activate or don't turn on or whatever and collect a bunch of data
[00:39:01] whatever and collect a bunch of data that way and then and with one or two
[00:39:07] that way and then and with one or two days you should be able to collect at
[00:39:10] days you should be able to collect at least hundreds of examples and that
[00:39:14] least hundreds of examples and that might be enough of a data set to start
[00:39:16] might be enough of a data set to start training a rudimentary learning
[00:39:17] training a rudimentary learning algorithm to get going
[00:39:19] algorithm to get going because if you have not yet worked on
[00:39:21] because if you have not yet worked on this problem before it turns out to be
[00:39:23] this problem before it turns out to be very difficult to know what's going to
[00:39:25] very difficult to know what's going to be hard about the problem so it's what's
[00:39:27] be hard about the problem so it's what's going to be hard
[00:39:28] going to be hard highly accented speakers right oh it
[00:39:31] highly accented speakers right oh it smells gonna be hard
[00:39:32] smells gonna be hard background noise what's gonna be hard
[00:39:35] background noise what's gonna be hard you know confusing turn on with turn off
[00:39:37] you know confusing turn on with turn off you hear John turn and then but when you
[00:39:41] you hear John turn and then but when you build a new machine learning system it's
[00:39:44] build a new machine learning system it's very difficult to know what's hard it
[00:39:46] very difficult to know what's hard it was easy about the problem
[00:39:48] was easy about the problem what's going to be difficult that Fafi
[00:39:51] what's going to be difficult that Fafi which is the technical term for if the
[00:39:53] which is the technical term for if the microphone is very far away
[00:39:55] microphone is very far away all right so turns out that you know if
[00:39:57] all right so turns out that you know if we turn on the microphone on my laptop
[00:39:59] we turn on the microphone on my laptop now for example the laptop which is like
[00:40:03] now for example the laptop which is like three meters away from me will be
[00:40:05] three meters away from me will be hearing voice directly from my null as
[00:40:08] hearing voice directly from my null as well as voice bouncing off the wall so
[00:40:10] well as voice bouncing off the wall so there's a lot of reverberation in this
[00:40:11] there's a lot of reverberation in this room and so that makes speech
[00:40:12] room and so that makes speech recognition harder humor here are so
[00:40:14] recognition harder humor here are so good at processing out reverberant
[00:40:16] good at processing out reverberant sounds reverberations that you almost
[00:40:18] sounds reverberations that you almost don't notice it but it makes it actually
[00:40:20] don't notice it but it makes it actually but but learning every room will have
[00:40:22] but but learning every room will have sometimes has problems with
[00:40:24] sometimes has problems with reverberations ray or echoes bouncing
[00:40:26] reverberations ray or echoes bouncing off the hard walls in this room and so
[00:40:29] off the hard walls in this room and so depending on what you're learning album
[00:40:31] depending on what you're learning album has trouble with you will then want to
[00:40:34] has trouble with you will then want to go back to collect very different types
[00:40:36] go back to collect very different types of data or explore very different types
[00:40:37] of data or explore very different types of algorithms well that's the problem
[00:40:38] of algorithms well that's the problem that sometimes it's because just the
[00:40:40] that sometimes it's because just the volume is just too soft in which case
[00:40:42] volume is just too soft in which case you know maybe you need to do something
[00:40:44] you know maybe you need to do something else and normalize all the volumes or
[00:40:45] else and normalize all the volumes or buy more since it might for something so
[00:40:47] buy more since it might for something so it turns out that when building most
[00:40:48] it turns out that when building most machine learning applications unless
[00:40:50] machine learning applications unless you've experienced working on it so I've
[00:40:53] you've experienced working on it so I've actually worked on this problem before
[00:40:54] actually worked on this problem before so have a sense it was hard it was easy
[00:40:56] so have a sense it was hard it was easy but they work on the new project for the
[00:40:57] but they work on the new project for the first time it's very difficult to know
[00:40:59] first time it's very difficult to know what's on it was easy and so my advice
[00:41:00] what's on it was easy and so my advice to most teams is rather than spending
[00:41:04] to most teams is rather than spending say 20 days to collect data and then two
[00:41:10] say 20 days to collect data and then two days to collect model to train a model
[00:41:13] days to collect model to train a model and it's often by training a model and
[00:41:15] and it's often by training a model and then seeing what are the examples it
[00:41:18] then seeing what are the examples it gets wrong whether the averin fail that
[00:41:20] gets wrong whether the averin fail that that lets you feedback to either collect
[00:41:23] that lets you feedback to either collect more data or redesign the model
[00:41:27] more data or redesign the model right or try something else and if you
[00:41:30] right or try something else and if you can string the data collection peering
[00:41:32] can string the data collection peering down to be more comparable to how long
[00:41:35] down to be more comparable to how long you end up taking to train your model
[00:41:37] you end up taking to train your model then you can start iterating much more
[00:41:39] then you can start iterating much more rapidly on actually improving your model
[00:41:42] rapidly on actually improving your model right oh and um so maybe one roof that I
[00:41:46] right oh and um so maybe one roof that I should tend to recommend for most cost
[00:41:47] should tend to recommend for most cost projects is I don't know if it may be if
[00:41:50] projects is I don't know if it may be if you need to spend a week up to a week to
[00:41:51] you need to spend a week up to a week to collect data you know maybe that's okay
[00:41:53] collect data you know maybe that's okay but you can get going even more quickly
[00:41:55] but you can get going even more quickly I would even maybe more strongly
[00:41:58] I would even maybe more strongly recommend that and there been so few
[00:42:03] recommend that and there been so few examples in my life where the first time
[00:42:05] examples in my life where the first time I trained the learning algorithm it
[00:42:06] I trained the learning algorithm it works right it like pretty much never
[00:42:08] works right it like pretty much never happens yeah it happened once about a
[00:42:11] happens yeah it happened once about a year ago and I was so surprised I still
[00:42:13] year ago and I was so surprised I still remember that one time and so what so so
[00:42:18] remember that one time and so what so so machine learning development is often a
[00:42:20] machine learning development is often a very iterative process and by quickly
[00:42:22] very iterative process and by quickly find a set and and often datasets are
[00:42:25] find a set and and often datasets are collected through sweat and hard work
[00:42:26] collected through sweat and hard work right and so I would literally you know
[00:42:29] right and so I would literally you know and actually what my version along well
[00:42:31] and actually what my version along well speech indicate going quickly I would
[00:42:33] speech indicate going quickly I would probably just have myself well my team
[00:42:36] probably just have myself well my team members run around and find people and
[00:42:39] members run around and find people and ask them to speak into microphone and
[00:42:40] ask them to speak into microphone and record all your clips that way and then
[00:42:43] record all your clips that way and then only when you validate that you need a
[00:42:44] only when you validate that you need a bigger data set which you go to more
[00:42:46] bigger data set which you go to more complicated things I said of an Amazon
[00:42:48] complicated things I said of an Amazon Mechanical Turk thing right to
[00:42:49] Mechanical Turk thing right to crowdsource which I've also done
[00:42:51] crowdsource which I've also done actually I've also had very largely a
[00:42:53] actually I've also had very largely a self collected of Amazon Mechanical Turk
[00:42:55] self collected of Amazon Mechanical Turk but only in a later stage the project
[00:42:57] but only in a later stage the project and you understand what you really need
[00:42:59] and you understand what you really need um so as you as you start work on your
[00:43:05] um so as you as you start work on your class projects maybe maybe keep that
[00:43:07] class projects maybe maybe keep that keep that in mind now
[00:43:27] so one other tip that machine learning
[00:43:31] so one other tip that machine learning researchers on average we tend to be
[00:43:33] researchers on average we tend to be terrible at this but I'll give this
[00:43:36] terrible at this but I'll give this advice anyway is when you're going
[00:43:37] advice anyway is when you're going through this process yes someday there
[00:43:39] through this process yes someday there are design a model a literature search
[00:43:41] are design a model a literature search would be very helpful you know so see
[00:43:43] would be very helpful you know so see what other see what algorithms others
[00:43:44] what other see what algorithms others are using for this problem it turns out
[00:43:46] are using for this problem it turns out the literature actually quite immature
[00:43:47] the literature actually quite immature there isn't a convergence of like a well
[00:43:50] there isn't a convergence of like a well standard set of standard algorithms for
[00:43:52] standard set of standard algorithms for trigger word detection in literature
[00:43:54] trigger word detection in literature right now much people are still making
[00:43:56] right now much people are still making up algorithms so if you if you do the
[00:43:57] up algorithms so if you if you do the survey you find that to be case but you
[00:43:59] survey you find that to be case but you need trained initial model and in most
[00:44:02] need trained initial model and in most machine learning applications you go
[00:44:03] machine learning applications you go through this process multiple time so
[00:44:05] through this process multiple time so one tip that I would recommend you do is
[00:44:08] one tip that I would recommend you do is uh keep clear notes on the experiments
[00:44:15] uh keep clear notes on the experiments you've run because so often be as we
[00:44:21] you've run because so often be as we train them although you see oh this
[00:44:22] train them although you see oh this model works great on American accent two
[00:44:24] model works great on American accent two speakers but not on British accent two
[00:44:26] speakers but not on British accent two speakers right I was born in the UK so
[00:44:29] speakers right I was born in the UK so I'm just use Pradesh accents run example
[00:44:30] I'm just use Pradesh accents run example if you're a different part of the world
[00:44:31] if you're a different part of the world you think of different global axes it
[00:44:34] you think of different global axes it sends down from the UK I'm just a pick
[00:44:36] sends down from the UK I'm just a pick on British accents I guess keep clear
[00:44:38] on British accents I guess keep clear notes on the experiments run because
[00:44:40] notes on the experiments run because what happens in every machine learning
[00:44:42] what happens in every machine learning project is after a while you have
[00:44:43] project is after a while you have trained 30 models and then you and your
[00:44:45] trained 30 models and then you and your team is occurring oh yeah we tried that
[00:44:47] team is occurring oh yeah we tried that idea two weeks ago didn't work and if
[00:44:49] idea two weeks ago didn't work and if you have clear notes from when you
[00:44:51] you have clear notes from when you actually did that work two years ago
[00:44:52] actually did that work two years ago then you can refer back rather than have
[00:44:54] then you can refer back rather than have to rerun an experiment
[00:44:55] to rerun an experiment oh the other thing that some groups do
[00:44:57] oh the other thing that some groups do is have a spreadsheet that keeps track
[00:45:01] is have a spreadsheet that keeps track of what's the learning rate you use
[00:45:03] of what's the learning rate you use what's the number of hidden units what's
[00:45:04] what's the number of hidden units what's this was this was this or cheaper than a
[00:45:07] this was this was this or cheaper than a in a text document so that which will
[00:45:09] in a text document so that which will make it easier to refer to it to know
[00:45:12] make it easier to refer to it to know some ways you try earlier this is one
[00:45:15] some ways you try earlier this is one piece of comedy giving advice and
[00:45:17] piece of comedy giving advice and there's one of those things that every
[00:45:19] there's one of those things that every machine learning person knows we should
[00:45:20] machine learning person knows we should do this but on average we're very bad at
[00:45:22] do this but on average we're very bad at doing
[00:45:23] doing but but that you could I don't know but
[00:45:27] but but that you could I don't know but at the times I manage to keep good knows
[00:45:29] at the times I manage to keep good knows is that she save them all the time right
[00:45:31] is that she save them all the time right to try to remember what exactly you
[00:45:32] to try to remember what exactly you tried two weeks ago okay so um Baba this
[00:45:37] tried two weeks ago okay so um Baba this class will be on this process how to get
[00:45:40] class will be on this process how to get data develop the chain dev test the
[00:45:42] data develop the chain dev test the design and model we train the model
[00:45:43] design and model we train the model eventually test them all then innovate
[00:45:45] eventually test them all then innovate so a lot of us causes on this so when
[00:45:48] so a lot of us causes on this so when they jump ahead to when you have a good
[00:45:49] they jump ahead to when you have a good enough model and you want to deploy it
[00:45:51] enough model and you want to deploy it okay so step six
[00:45:54] okay so step six my guess is deployment now um this is uh
[00:46:04] my guess is deployment now um this is uh one of the reasons I want to step
[00:46:06] one of the reasons I want to step through this example going through a
[00:46:08] through this example going through a concrete example is I find it when
[00:46:10] concrete example is I find it when you're learning about machine learning
[00:46:11] you're learning about machine learning for the first time it's often seeing you
[00:46:13] for the first time it's often seeing you know what my team's tend to call wall
[00:46:15] know what my team's tend to call wall stories kind of stories of projects that
[00:46:17] stories kind of stories of projects that others have built before that often
[00:46:20] others have built before that often provides the best learning experience so
[00:46:21] provides the best learning experience so I think like I have built speech
[00:46:23] I think like I have built speech recognition systems it took me like a
[00:46:25] recognition systems it took me like a year or two years ago me to do it so I'm
[00:46:27] year or two years ago me to do it so I'm trying to so rather than you know having
[00:46:30] trying to so rather than you know having you spend two years here alight building
[00:46:32] you spend two years here alight building speech systems if can summarize a war
[00:46:34] speech systems if can summarize a war story right to tell you what the process
[00:46:36] story right to tell you what the process is like I'm hoping that these concrete
[00:46:38] is like I'm hoping that these concrete examples of what building these systems
[00:46:39] examples of what building these systems are like in you know large corporations
[00:46:41] are like in you know large corporations that that can help you accelerate your
[00:46:43] that that can help you accelerate your learnings without needing to get two
[00:46:45] learnings without needing to get two years of on-the-job experience you can
[00:46:47] years of on-the-job experience you can just hear the salient points okay now if
[00:46:50] just hear the salient points okay now if you're deploying a system like this one
[00:46:52] you're deploying a system like this one of the things intersection true there's
[00:46:55] of the things intersection true there's actually real phenomenon for deploying
[00:46:57] actually real phenomenon for deploying speech systems is uh yet the audio clip
[00:46:59] speech systems is uh yet the audio clip you have a new network and then you know
[00:47:04] you have a new network and then you know this will output zero one and the neural
[00:47:07] this will output zero one and the neural networks that work well will tend to be
[00:47:10] networks that work well will tend to be relatively large relatively large mall
[00:47:12] relatively large relatively large mall the Marshall or his engine is relatively
[00:47:14] the Marshall or his engine is relatively high complexity and if you have so the
[00:47:17] high complexity and if you have so the smart speakers in your home you
[00:47:20] smart speakers in your home you recognize that a lot of them are age
[00:47:23] recognize that a lot of them are age devices as opposed to purely crawl
[00:47:26] devices as opposed to purely crawl computation right so we all know what
[00:47:28] computation right so we all know what the cloud is and what an edge devices an
[00:47:32] the cloud is and what an edge devices an edge device is a small speaker that's in
[00:47:34] edge device is a small speaker that's in your home or the cell phone in your
[00:47:36] your home or the cell phone in your so edge devices are you know the things
[00:47:39] so edge devices are you know the things that are close to the data is supposed
[00:47:41] that are close to the data is supposed to cloud which is a giant service we
[00:47:43] to cloud which is a giant service we have in our data centers right so um
[00:47:45] have in our data centers right so um because of network latency and and and
[00:47:50] because of network latency and and and and because of privacy a lot of these
[00:47:53] and because of privacy a lot of these computations are done on edge devices
[00:47:54] computations are done on edge devices like the small speaker in your home or
[00:47:57] like the small speaker in your home or like I guess hey Suri or okay Google can
[00:48:01] like I guess hey Suri or okay Google can wake up your cell phone right and so
[00:48:04] wake up your cell phone right and so edge devices have much lower
[00:48:06] edge devices have much lower computational budgets and much lower
[00:48:07] computational budgets and much lower power budgets limited battery life much
[00:48:10] power budgets limited battery life much less powerful processors than we have in
[00:48:11] less powerful processors than we have in our cloud data centers and so it turns
[00:48:15] our cloud data centers and so it turns out that salt salt serving up a very
[00:48:17] out that salt salt serving up a very large neural network is quite difficult
[00:48:20] large neural network is quite difficult right it's very difficult for you know a
[00:48:22] right it's very difficult for you know a low-power inexpensive microprocessor
[00:48:25] low-power inexpensive microprocessor sitting in the spots between your living
[00:48:27] sitting in the spots between your living room to run a very large neural network
[00:48:30] room to run a very large neural network with a lot of hidden units with all the
[00:48:31] with a lot of hidden units with all the parameters and so what is often done is
[00:48:38] parameters and so what is often done is to actually do this which is to input an
[00:48:51] to actually do this which is to input an audio clip and then have a much simpler
[00:48:55] audio clip and then have a much simpler algorithm figure out if you know anyone
[00:48:58] algorithm figure out if you know anyone is even talking right because so the
[00:49:01] is even talking right because so the smart speaker you know in my living room
[00:49:03] smart speaker you know in my living room here silence most of the day right
[00:49:05] here silence most of the day right because usually just no one at home
[00:49:06] because usually just no one at home writes no no no voice and then only if
[00:49:09] writes no no no voice and then only if it hears you know someone talking then
[00:49:12] it hears you know someone talking then feeding to the big neural network that
[00:49:16] feeding to the big neural network that you've trained and ramped up use a
[00:49:17] you've trained and ramped up use a larger power budget in order to classify
[00:49:22] larger power budget in order to classify 0 1 ok this component goes by many
[00:49:26] 0 1 ok this component goes by many different names in in reasonably
[00:49:29] different names in in reasonably standard terminology but not totally
[00:49:31] standard terminology but not totally standard terminology in the literature
[00:49:33] standard terminology in the literature I'm gonna call this VAD for a voice
[00:49:37] I'm gonna call this VAD for a voice activity detection it turns out that
[00:49:43] activity detection it turns out that voice activity detection is the standard
[00:49:45] voice activity detection is the standard component is in many different speech
[00:49:46] component is in many different speech recognition system
[00:49:47] recognition system if you are using a cell phone for
[00:49:50] if you are using a cell phone for example VAD is a component that tries to
[00:49:53] example VAD is a component that tries to figure there is even talking because if
[00:49:55] figure there is even talking because if it thinks no one is talking then there's
[00:49:56] it thinks no one is talking then there's no need to encode the audio and try to
[00:49:58] no need to encode the audio and try to transmit the audio right yeah LT could
[00:50:01] transmit the audio right yeah LT could you know yeah um and so so the next
[00:50:09] you know yeah um and so so the next question I want to ask you and then I I
[00:50:13] question I want to ask you and then I I thought this is timely because well is a
[00:50:20] thought this is timely because well is a couple options right option one is to
[00:50:25] couple options right option one is to build an on machine learning based EAD
[00:50:27] build an on machine learning based EAD system voice activity detection system
[00:50:29] system voice activity detection system which is just you know see if the volume
[00:50:36] which is just you know see if the volume of the audio your spawn speakers
[00:50:39] of the audio your spawn speakers recording is greater than epsilon so the
[00:50:41] recording is greater than epsilon so the silence just together and option two is
[00:50:46] silence just together and option two is train a small neural network to
[00:50:53] train a small neural network to recognize on on on human speech right
[00:50:59] recognize on on on human speech right and so my next question to you is if you
[00:51:07] and so my next question to you is if you work on this project which you pick
[00:51:09] work on this project which you pick option one or would you pick option to
[00:51:12] option one or would you pick option to write as you as you as you work to what
[00:51:15] write as you as you as you work to what oh sorry
[00:51:16] oh sorry and I think on a small neural network so
[00:51:19] and I think on a small neural network so to a small neural network or in some
[00:51:21] to a small neural network or in some cases I've seen people use a small
[00:51:23] cases I've seen people use a small support vector machine as well for those
[00:51:24] support vector machine as well for those you know what that is a small model can
[00:51:27] you know what that is a small model can be run with a low computational budget
[00:51:28] be run with a low computational budget it's a much simpler problem to Detective
[00:51:30] it's a much simpler problem to Detective someone is talking than to recognize the
[00:51:32] someone is talking than to recognize the word this is so you can actually do this
[00:51:33] word this is so you can actually do this you know what reasonable accuracy was
[00:51:36] you know what reasonable accuracy was small new network but if you actually
[00:51:38] small new network but if you actually work on this project for CS 230 which
[00:51:41] work on this project for CS 230 which would you try for us so could we come to
[00:51:44] would you try for us so could we come to the next question
[00:51:48] yeah yeah you can let them start on
[00:51:51] yeah yeah you can let them start on screen I guess and I mean why are you
[00:51:52] screen I guess and I mean why are you afraid other projection cool
[00:51:57] afraid other projection cool I'll just keep unlocking it periodically
[00:52:02] are people able to vote no they're no
[00:52:06] are people able to vote no they're no pants yeah well I see I guess you write
[00:52:19] pants yeah well I see I guess you write so much code you have a shortcut to go
[00:52:21] so much code you have a shortcut to go through your coding environment oh
[00:52:22] through your coding environment oh you're that all right cool oh well great
[00:52:49] you're that all right cool oh well great right people have seen quickly another
[00:52:53] right people have seen quickly another like 20 seconds if that's enough time to
[00:52:55] like 20 seconds if that's enough time to get your answers
[00:53:20] all right cool
[00:53:28] that's fascinating
[00:53:30] that's fascinating there's a lot of disagreement in this
[00:53:31] there's a lot of disagreement in this house people will not say why why would
[00:53:35] house people will not say why why would you choose option 1 why would you choose
[00:53:37] you choose option 1 why would you choose option 2 and then IIIi have a very
[00:53:40] option 2 and then IIIi have a very strong point of view on what I would do
[00:53:41] strong point of view on what I would do right but but I'm curious why why option
[00:53:45] right but but I'm curious why why option 1 and why option to go ahead either
[00:54:02] [Music]
[00:54:15] option 2 you can probably kind of
[00:54:19] option 2 you can probably kind of already like this simplify the problem
[00:54:22] already like this simplify the problem when a consultant know exists not if it
[00:54:25] when a consultant know exists not if it knows I mean activates the machine but
[00:54:30] if it's out parking option to be much
[00:54:33] if it's out parking option to be much better yes option 2
[00:54:48] 300 when someone's whistling oh yeah
[00:55:14] 300 when someone's whistling oh yeah right if you're in the noisy place like
[00:55:16] right if you're in the noisy place like you know I have a friend who saw the
[00:55:18] you know I have a friend who saw the statue this next a train station and so
[00:55:20] statue this next a train station and so right so option one we picked up a lot
[00:55:22] right so option one we picked up a lot the train which ever it has to be
[00:55:26] the train which ever it has to be running constantly so you want something
[00:55:28] running constantly so you want something in Spanish so it seems like option one
[00:55:30] in Spanish so it seems like option one is better because yeah whether it has to
[00:55:36] is better because yeah whether it has to become running constantly you still want
[00:55:37] become running constantly you still want to be like no power no country so let me
[00:55:40] to be like no power no country so let me show you for the pros and cons um so um
[00:55:45] show you for the pros and cons um so um I think you know there are pros and cons
[00:55:48] I think you know there are pros and cons option when the option to versus while
[00:55:49] option when the option to versus while you're sewing so so many votes for both
[00:55:51] you're sewing so so many votes for both options I perceive would choose option 1
[00:55:55] options I perceive would choose option 1 but but let me just let's just discuss
[00:55:57] but but let me just let's just discuss the pros and cons right I think that um
[00:56:00] the pros and cons right I think that um option 1 um first is just a few lines of
[00:56:03] option 1 um first is just a few lines of code this is yes maybe option 2 isn't
[00:56:06] code this is yes maybe option 2 isn't that complicated but option 1 is even
[00:56:07] that complicated but option 1 is even simpler and I think that um actually
[00:56:12] simpler and I think that um actually maybe I would say if I hadn't worked on
[00:56:14] maybe I would say if I hadn't worked on this problem before I which is option 1
[00:56:16] this problem before I which is option 1 but since I have experience as feature a
[00:56:18] but since I have experience as feature a commission eventually I know you need
[00:56:20] commission eventually I know you need option 2 but that's because I because
[00:56:22] option 2 but that's because I because I've worked on this problem before but
[00:56:24] I've worked on this problem before but if your first time work on the speech
[00:56:25] if your first time work on the speech application problem I would encourage
[00:56:27] application problem I would encourage you on average to try to really simple
[00:56:30] you on average to try to really simple quick and dirty solutions and go ahead
[00:56:32] quick and dirty solutions and go ahead and so let's see how long would it take
[00:56:35] and so let's see how long would it take you to implement this right I would say
[00:56:37] you to implement this right I would say like 10 minutes five minutes I don't
[00:56:39] like 10 minutes five minutes I don't know right Harlan would think of them
[00:56:42] know right Harlan would think of them with that oh four hours one day I I
[00:56:46] with that oh four hours one day I I don't really know actually right now let
[00:56:49] don't really know actually right now let me just write one day and I'm not quite
[00:56:51] me just write one day and I'm not quite sure all right
[00:56:52] sure all right but if um option one commence in 10
[00:56:55] but if um option one commence in 10 minutes then I would encourage you to do
[00:56:58] minutes then I would encourage you to do that and go ahead and put the
[00:57:00] that and go ahead and put the smart speaker in your home or in your
[00:57:02] smart speaker in your home or in your potential users homes and only when you
[00:57:05] potential users homes and only when you find out that the dog barking is a
[00:57:08] find out that the dog barking is a problem or the train on the railway
[00:57:10] problem or the train on the railway sings you know whatever it's a problem
[00:57:11] sings you know whatever it's a problem then go back and invest more in fixing
[00:57:14] then go back and invest more in fixing it right and in fact um it's true that
[00:57:17] it right and in fact um it's true that maybe it's annoying that the dog barking
[00:57:19] maybe it's annoying that the dog barking keeps on waking up the system
[00:57:20] keeps on waking up the system but maybe that's okay because if the
[00:57:22] but maybe that's okay because if the large new network then screens out all
[00:57:24] large new network then screens out all the dog barking then the overall
[00:57:26] the dog barking then the overall performance system is actually just fine
[00:57:28] performance system is actually just fine and and and then you now have a much
[00:57:30] and and and then you now have a much simpler system rights but but but it
[00:57:34] simpler system rights but but but it turns out that um the reason you might
[00:57:37] turns out that um the reason you might need to go to option to eventually is
[00:57:39] need to go to option to eventually is because there are some homes in noisy
[00:57:41] because there are some homes in noisy environments
[00:57:42] environments you know this constant background noise
[00:57:44] you know this constant background noise and so that will keep the large new
[00:57:46] and so that will keep the large new network running longer too frequently so
[00:57:48] network running longer too frequently so so if you have a large engineering
[00:57:50] so if you have a large engineering budget you know so it's not the small
[00:57:52] budget you know so it's not the small speaker teams are hundreds of engineers
[00:57:54] speaker teams are hundreds of engineers working on it they have hundreds of
[00:57:55] working on it they have hundreds of engines work on that totally options who
[00:57:58] engines work on that totally options who will perform better but if you're
[00:58:00] will perform better but if you're strapped a start-up team is scrappy
[00:58:02] strapped a start-up team is scrappy startup team with three of you work on a
[00:58:04] startup team with three of you work on a cross project you know the evidence that
[00:58:07] cross project you know the evidence that you need that love of complexity is not
[00:58:10] you need that love of complexity is not that high and I would really do that
[00:58:12] that high and I would really do that first and and use that to gather
[00:58:14] first and and use that to gather evidence that you really should make the
[00:58:16] evidence that you really should make the investment to build more complex system
[00:58:18] investment to build more complex system before actually making the investments
[00:58:19] before actually making the investments of days or and eventually I think this
[00:58:23] of days or and eventually I think this is one day to put your first prototype
[00:58:24] is one day to put your first prototype right and then eventually will be will
[00:58:26] right and then eventually will be will be more complicated um it turns out that
[00:58:30] be more complicated um it turns out that the other reason the other huge
[00:58:36] the other reason the other huge advantage of the simple method is the
[00:58:38] advantage of the simple method is the following oh and this is one of the
[00:58:43] following oh and this is one of the frankly this is one of the this is
[00:58:45] frankly this is one of the this is actually one of the big problems and big
[00:58:47] actually one of the big problems and big weaknesses of machine learning
[00:58:48] weaknesses of machine learning algorithms and deep learning of rooms
[00:58:49] algorithms and deep learning of rooms which is what happens is uh when you
[00:58:54] which is what happens is uh when you build a system and you should ship a
[00:58:55] build a system and you should ship a product the data will change right and
[00:58:58] product the data will change right and so I'm gonna sin Phi the example of it
[00:59:00] so I'm gonna sin Phi the example of it but you know I know Stanford is very
[00:59:02] but you know I know Stanford is very cosmopolitan
[00:59:03] cosmopolitan this powell's is very hot so on see the
[00:59:05] this powell's is very hot so on see the collect data in this region you get
[00:59:07] collect data in this region you get access from people all over the world
[00:59:09] access from people all over the world right because because that's Stanford
[00:59:10] right because because that's Stanford all that's
[00:59:11] all that's although but but just to simplify these
[00:59:13] although but but just to simplify these app a little bit let's say that you
[00:59:15] app a little bit let's say that you train on u.s. accents right but you know
[00:59:24] train on u.s. accents right but you know for some reason when you ship a product
[00:59:27] for some reason when you ship a product maybe it sells really well in the UK and
[00:59:30] maybe it sells really well in the UK and you start getting data with UK or with
[00:59:38] you start getting data with UK or with British accents so one of the biggest
[00:59:44] British accents so one of the biggest problems you face in practical
[00:59:46] problems you face in practical deployment of machine learning systems
[00:59:48] deployment of machine learning systems is that the data you train on is not
[00:59:51] is that the data you train on is not going to be the data you need to perform
[00:59:52] going to be the data you need to perform well on and and I'm going to share with
[00:59:56] well on and and I'm going to share with you some practical ideas for how to
[00:59:57] you some practical ideas for how to solve this but this is one of those
[00:59:59] solve this but this is one of those practical realities and practical
[01:00:01] practical realities and practical reasons is machine learning that is
[01:00:03] reasons is machine learning that is actually not talked about much in
[01:00:05] actually not talked about much in academia because it turns out that the
[01:00:08] academia because it turns out that the data says we have in academia are not
[01:00:10] data says we have in academia are not selling well for researchers to study
[01:00:13] selling well for researchers to study and publish papers on this I think we
[01:00:14] and publish papers on this I think we can sell new machine learning benchmarks
[01:00:16] can sell new machine learning benchmarks in the future but there's one of those
[01:00:17] in the future but there's one of those problems that is actually kind of
[01:00:19] problems that is actually kind of underappreciated in academic literature
[01:00:20] underappreciated in academic literature but that is a problem facing many many
[01:00:24] but that is a problem facing many many practical deployments machine learning
[01:00:26] practical deployments machine learning algorithms and and so more generally eat
[01:00:32] algorithms and and so more generally eat the problem is one of data changing
[01:00:34] the problem is one of data changing right and you might have new classes of
[01:00:37] right and you might have new classes of users with new accents or you might
[01:00:42] users with new accents or you might train a lot on the maybe you get data
[01:00:45] train a lot on the maybe you get data from even Stanford users and maybe
[01:00:48] from even Stanford users and maybe Stanford is not too noisy or Stanford at
[01:00:50] Stanford is not too noisy or Stanford at certain you know types of characters
[01:00:51] certain you know types of characters things when you ship it to another city
[01:00:53] things when you ship it to another city another country there's much more noisy
[01:00:55] another country there's much more noisy you know different background noise
[01:01:02] right or you start manufacturing the
[01:01:05] right or you start manufacturing the small speaker and to lower the cost of
[01:01:07] small speaker and to lower the cost of the speaker they swap it out they swap
[01:01:10] the speaker they swap it out they swap out the high-end microphone that you use
[01:01:13] out the high-end microphone that you use from your laptop to collect the data
[01:01:14] from your laptop to collect the data from low-end microphone this very common
[01:01:19] from low-end microphone this very common thing done in you know well done the
[01:01:21] thing done in you know well done the manufacturing right if you can use a
[01:01:22] manufacturing right if you can use a cheaper microphone wine
[01:01:24] cheaper microphone wine and often to human ears the sound sounds
[01:01:27] and often to human ears the sound sounds just fine on a cheaper microphone but if
[01:01:29] just fine on a cheaper microphone but if you change your learning algorithm using
[01:01:31] you change your learning algorithm using your you know I guess yeah well I use a
[01:01:33] your you know I guess yeah well I use a map but the Mac has a pretty decent
[01:01:35] map but the Mac has a pretty decent microphone so if you train the data
[01:01:36] microphone so if you train the data using all your craft or a Mac and then
[01:01:38] using all your craft or a Mac and then eventually is a different microphone it
[01:01:40] eventually is a different microphone it may not generalize well so one of the
[01:01:44] may not generalize well so one of the challenges of machine learning is that
[01:01:47] challenges of machine learning is that you often develop a system on one
[01:01:50] you often develop a system on one dataset and then when you ship a product
[01:01:51] dataset and then when you ship a product something about the world changes and
[01:01:54] something about the world changes and your system needs to perform on a very
[01:01:57] your system needs to perform on a very different type of data than what you had
[01:01:59] different type of data than what you had trained up and so and so what would
[01:02:11] trained up and so and so what would happen is after you deploy the model the
[01:02:16] happen is after you deploy the model the world may change and you often end up
[01:02:18] world may change and you often end up going back to get more data redesign the
[01:02:21] going back to get more data redesign the model right and I guess sorry and this
[01:02:23] model right and I guess sorry and this is this is a the maintenance of the
[01:02:26] is this is a the maintenance of the machine learning model only give some of
[01:02:28] machine learning model only give some of the examples web search right this
[01:02:33] the examples web search right this happens all the time at multiple search
[01:02:34] happens all the time at multiple search engines which is you train a neural
[01:02:37] engines which is you train a neural network or you train a system to give
[01:02:39] network or you train a system to give relevant web search results but then
[01:02:41] relevant web search results but then something about the world changes your
[01:02:43] something about the world changes your for example there's a major public web
[01:02:44] for example there's a major public web and some new person is elected president
[01:02:46] and some new person is elected president of some foreign country or there's a
[01:02:48] of some foreign country or there's a major scandal or just the internet
[01:02:51] major scandal or just the internet changes right or there's a actually what
[01:02:54] changes right or there's a actually what happens in China is a new words getting
[01:02:55] happens in China is a new words getting invented all the time in China China
[01:02:58] invented all the time in China China says that by of were that Google and
[01:03:00] says that by of were that Google and Baidu but the Chinese language is more
[01:03:02] Baidu but the Chinese language is more fluid than the English language and so
[01:03:04] fluid than the English language and so new words get invented all the time and
[01:03:06] new words get invented all the time and so the language changes and so whatever
[01:03:08] so the language changes and so whatever your train just isn't working as long as
[01:03:10] your train just isn't working as long as it used to right or maybe a different
[01:03:15] it used to right or maybe a different company suddenly shuts off you know
[01:03:17] company suddenly shuts off you know their entire website to your search
[01:03:19] their entire website to your search index because they don't want you
[01:03:20] index because they don't want you indexing their website and so the
[01:03:22] indexing their website and so the internet changes and what have you had
[01:03:24] internet changes and what have you had done doesn't work anymore or
[01:03:33] it turns out if you build a self-driving
[01:03:34] it turns out if you build a self-driving car in California and then you try to
[01:03:36] car in California and then you try to deploy these vehicles in Texas you know
[01:03:40] deploy these vehicles in Texas you know it turns out traffic lights in Texas
[01:03:42] it turns out traffic lights in Texas look very different than traffic lights
[01:03:43] look very different than traffic lights in California so um although it rained
[01:03:46] in California so um although it rained on California Texas so a new network
[01:03:50] on California Texas so a new network trained to recognize California traffic
[01:03:52] trained to recognize California traffic lights actually doesn't work very well
[01:03:54] lights actually doesn't work very well on Texas traffic lights right I'm trying
[01:03:57] on Texas traffic lights right I'm trying to remember which way round leanest I
[01:03:58] to remember which way round leanest I think California Texas has a different
[01:04:01] think California Texas has a different distribution of horizontal versus
[01:04:02] distribution of horizontal versus vertical traffic lights for example
[01:04:04] vertical traffic lights for example right it's actually humans don't else's
[01:04:06] right it's actually humans don't else's you go oh yeah red yellow green but the
[01:04:08] you go oh yeah red yellow green but the learning algorithm doesn't actually
[01:04:09] learning algorithm doesn't actually generalize that well if you go to
[01:04:10] generalize that well if you go to different locations go to a foreign
[01:04:12] different locations go to a foreign country again traffic light signage the
[01:04:14] country again traffic light signage the lane markers all change or I guess what
[01:04:19] lane markers all change or I guess what one example is working on earlier this
[01:04:21] one example is working on earlier this week right manufacturing right landing a
[01:04:24] week right manufacturing right landing a guy working on inspection of parts and
[01:04:27] guy working on inspection of parts and factories and so if you are doing visual
[01:04:31] factories and so if you are doing visual inspection in the factory and the
[01:04:34] inspection in the factory and the factory starts making a new component
[01:04:35] factory starts making a new component you know they're making this model cell
[01:04:37] you know they're making this model cell phone but cell phones turn over quickly
[01:04:40] phone but cell phones turn over quickly and so but in a few months later they're
[01:04:41] and so but in a few months later they're making a different type of cell phone or
[01:04:43] making a different type of cell phone or something weird happens in a vacuum
[01:04:44] something weird happens in a vacuum process so the lighting changes with a
[01:04:46] process so the lighting changes with a new type of defect so the world changes
[01:04:49] new type of defect so the world changes and um so what I'd like to do is
[01:04:59] and um so what I'd like to do is actually revisit the previous question
[01:05:03] actually revisit the previous question in light of this the world changes
[01:05:07] in light of this the world changes phenomenon right which is let's say
[01:05:10] phenomenon right which is let's say you've collected all data with American
[01:05:12] you've collected all data with American accent two speakers and then you know we
[01:05:14] accent two speakers and then you know we ship the product in the UK and then and
[01:05:20] ship the product in the UK and then and then for some reason you find that
[01:05:23] then for some reason you find that you've all these British accent speakers
[01:05:24] you've all these British accent speakers right trying to use your spot speaker so
[01:05:28] right trying to use your spot speaker so between these two algorithms the non
[01:05:29] between these two algorithms the non machine learning approaches I said the
[01:05:31] machine learning approaches I said the threshold versus train a neural network
[01:05:33] threshold versus train a neural network which system do you think would be more
[01:05:35] which system do you think would be more robust for dat voice activity detection
[01:05:55] all right take like another 40 seconds
[01:06:36] all right yeah interesting if you want
[01:06:42] all right yeah interesting if you want to comment well more people voted for
[01:06:46] to comment well more people voted for non ml just want to explain why for the
[01:07:14] non ml just want to explain why for the VAD boys activity section if you just
[01:07:17] VAD boys activity section if you just measure the volume then it doesn't
[01:07:18] measure the volume then it doesn't really depend on on the accent like so
[01:07:21] really depend on on the accent like so non ml might be more robust anyone else
[01:07:28] all right so okay let me show you I
[01:07:31] all right so okay let me show you I thought so it turns out that um if you
[01:07:33] thought so it turns out that um if you train a small neural network to you know
[01:07:38] train a small neural network to you know American accent is speech there's a
[01:07:40] American accent is speech there's a bigger chance that your neural network
[01:07:43] bigger chance that your neural network because it's so clever right that'll
[01:07:45] because it's so clever right that'll learn to recognize American speech and
[01:07:47] learn to recognize American speech and have a harder time generalizing to
[01:07:49] have a harder time generalizing to British accent in speech he says and so
[01:07:53] British accent in speech he says and so one of the things that have seen a lot
[01:07:56] one of the things that have seen a lot of teams where is so one way the non-mo
[01:08:00] of teams where is so one way the non-mo thing could fail to generalize would be
[01:08:02] thing could fail to generalize would be a British speakers are systematically
[01:08:04] a British speakers are systematically you know allowed there or softer than
[01:08:06] you know allowed there or softer than American speakers right sir
[01:08:07] American speakers right sir you know I don't know I don't have
[01:08:09] you know I don't know I don't have Americans saris Oakley allowed their own
[01:08:11] Americans saris Oakley allowed their own less loud and British but but you know
[01:08:12] less loud and British but but you know but if but if American British because
[01:08:14] but if but if American British because one one country just as louder voices
[01:08:17] one one country just as louder voices and softer voices then maybe the
[01:08:19] and softer voices then maybe the threshold you set won't generalize well
[01:08:21] threshold you set won't generalize well but that seems unlikely right I don't
[01:08:23] but that seems unlikely right I don't see that being realistically but but
[01:08:26] see that being realistically but but they were training on your network a lot
[01:08:27] they were training on your network a lot parameters then it's more likely that
[01:08:31] parameters then it's more likely that the neural network will pick up on some
[01:08:33] the neural network will pick up on some idiosyncrasy of American accents to
[01:08:36] idiosyncrasy of American accents to decide the cities even speaking and thus
[01:08:39] decide the cities even speaking and thus maybe less robust to generalizing into a
[01:08:42] maybe less robust to generalizing into a British accent speech right and another
[01:08:44] British accent speech right and another way to think about this is if you
[01:08:45] way to think about this is if you imagine to take it even further example
[01:08:47] imagine to take it even further example imagine that you're using VAD for a
[01:08:50] imagine that you're using VAD for a totally different language than intent
[01:08:51] totally different language than intent in English right where take a different
[01:08:55] in English right where take a different language you know Chinese or Hindi you
[01:08:57] language you know Chinese or Hindi you all Spanish or something where the
[01:08:59] all Spanish or something where the sounds are really different
[01:09:00] sounds are really different if you create a VAD system to detect you
[01:09:03] if you create a VAD system to detect you know English
[01:09:04] know English it may not at all work for detecting
[01:09:06] it may not at all work for detecting Spanish or Chinese or French or or some
[01:09:09] Spanish or Chinese or French or or some other language and so if you think of
[01:09:12] other language and so if you think of British accents as somewhere on the
[01:09:14] British accents as somewhere on the spectrum not the foreign language
[01:09:16] spectrum not the foreign language binding means but just more different
[01:09:17] binding means but just more different then I think the nominal system is more
[01:09:21] then I think the nominal system is more likely to be robust and so one lesson is
[01:09:25] likely to be robust and so one lesson is that too many that a lot of machine
[01:09:27] that too many that a lot of machine learning teams during the hard way is uh
[01:09:29] learning teams during the hard way is uh if you don't need to use a learning
[01:09:31] if you don't need to use a learning algorithm or something if you can hand
[01:09:33] algorithm or something if you can hand code a simple room like if volume
[01:09:36] code a simple room like if volume greater than 0.01 do this all that those
[01:09:39] greater than 0.01 do this all that those rules are
[01:09:40] rules are can be more robust and the one of the
[01:09:44] can be more robust and the one of the reasons we use learning algorithms is
[01:09:45] reasons we use learning algorithms is when we can't hang called something
[01:09:46] when we can't hang called something right I don't know how the hand comes
[01:09:48] right I don't know how the hand comes something to detect a cat or to take a
[01:09:50] something to detect a cat or to take a car in the real goal detect the person
[01:09:52] car in the real goal detect the person so use learning our ones for those but
[01:09:54] so use learning our ones for those but there's actually hand coded rule that
[01:09:56] there's actually hand coded rule that actually does pretty well you find that
[01:09:58] actually does pretty well you find that it is more robust to ships in the data
[01:10:01] it is more robust to ships in the data and will often generalize better oh and
[01:10:05] and will often generalize better oh and if any of you take a we talked about
[01:10:08] if any of you take a we talked about this about this little bit in CS 239 I
[01:10:10] this about this little bit in CS 239 I think this is your ninety talks about
[01:10:12] think this is your ninety talks about this as well but this particular
[01:10:14] this as well but this particular observation is backed up by very
[01:10:15] observation is backed up by very rigorous learning theory and the
[01:10:17] rigorous learning theory and the learning theory is basically that the
[01:10:19] learning theory is basically that the fewer parameters you have if you still
[01:10:22] fewer parameters you have if you still do well on your training set if you can
[01:10:23] do well on your training set if you can have model with very few parameters that
[01:10:25] have model with very few parameters that does well on your training set you
[01:10:27] does well on your training set you generalize better right so there's this
[01:10:28] generalize better right so there's this very rigorous machine learning theory
[01:10:30] very rigorous machine learning theory that basically says that and in the case
[01:10:33] that basically says that and in the case of the non machine learning approach
[01:10:34] of the non machine learning approach there's maybe one parameter which is
[01:10:36] there's maybe one parameter which is what's the threshold for epsilon and
[01:10:37] what's the threshold for epsilon and that's welcome well now for your
[01:10:39] that's welcome well now for your training set then you're also fit
[01:10:40] training set then you're also fit generalizing even when the data changes
[01:10:42] generalizing even when the data changes is um much higher right um now the last
[01:10:52] is um much higher right um now the last question um I want to post the
[01:10:56] question um I want to post the discussion today is when when discussing
[01:10:59] discussion today is when when discussing deployments oh and so one of the lessons
[01:11:02] deployments oh and so one of the lessons deployment is that's just a way the
[01:11:04] deployment is that's just a way the world works you know build a machine
[01:11:05] world works you know build a machine learning system deploy it the world will
[01:11:07] learning system deploy it the world will usually change and you often end up
[01:11:08] usually change and you often end up collecting data and have it integrated
[01:11:10] collecting data and have it integrated and maybe improve the model right and
[01:11:12] and maybe improve the model right and they're fixing waffle for British
[01:11:13] they're fixing waffle for British speakers or nothing um so we talked
[01:11:16] speakers or nothing um so we talked about edged appointments as well called
[01:11:19] about edged appointments as well called appointments and so um ignoring issues
[01:11:24] appointments and so um ignoring issues of user privacy and latency which is
[01:11:26] of user privacy and latency which is super important but for purposes
[01:11:28] super important but for purposes question let's let's let's put aside
[01:11:30] question let's let's let's put aside issues of user privacy and network
[01:11:32] issues of user privacy and network latency if you need to maintain the
[01:11:34] latency if you need to maintain the model sorry maintenance means updating
[01:11:36] model sorry maintenance means updating the model right even as the world
[01:11:38] the model right even as the world changes
[01:11:40] sorry I missed mr. history does a column
[01:11:43] sorry I missed mr. history does a column or H deployment make maintenance easier
[01:11:46] or H deployment make maintenance easier if not of right why don't you watch you
[01:11:48] if not of right why don't you watch you just enter a one-word answer
[01:11:50] just enter a one-word answer and why right and so maintenance is
[01:11:53] and why right and so maintenance is going the world changes something
[01:11:55] going the world changes something changes so you need to update the
[01:11:56] changes so you need to update the learning model to take it back take care
[01:11:58] learning model to take it back take care of this British accent so which type of
[01:12:01] of this British accent so which type of deployment makes it easier let me just
[01:12:08] deployment makes it easier let me just take like yeah another two minutes and
[01:12:10] take like yeah another two minutes and your answers
[01:13:13] all right another 50 seconds
[01:13:57] all right cool see what people wrote Wow
[01:14:08] all right cool see what people wrote Wow cool great all right almost everyone the
[01:14:09] cool great all right almost everyone the same college most people are saying
[01:14:17] same college most people are saying cloud alright cool
[01:14:21] cloud alright cool great and then just to summarize I think
[01:14:23] great and then just to summarize I think there are two reasons why most people
[01:14:26] there are two reasons why most people say this is easier push updates that's
[01:14:27] say this is easier push updates that's part of it I think the other part of it
[01:14:29] part of it I think the other part of it is that if all the data lives at the
[01:14:31] is that if all the data lives at the edge if all the data's process you know
[01:14:33] edge if all the data's process you know they use this home and then if it comes
[01:14:34] they use this home and then if it comes to crawl then even if you have all these
[01:14:36] to crawl then even if you have all these unhappy British accents and users you
[01:14:38] unhappy British accents and users you may not even find out right you say the
[01:14:40] may not even find out right you say the company headquarters you have all these
[01:14:41] company headquarters you have all these users that mysteriously
[01:14:43] users that mysteriously you know seem to be not using your
[01:14:44] you know seem to be not using your device maybe because they're in
[01:14:46] device maybe because they're in satisfied with it but if the data isn't
[01:14:48] satisfied with it but if the data isn't coming into your service in the cloud
[01:14:49] coming into your service in the cloud then you may not even find out about it
[01:14:51] then you may not even find out about it now there's serious issues about user
[01:14:52] now there's serious issues about user privacy as well security right so so so
[01:14:56] privacy as well security right so so so please if you ever bought the product
[01:14:57] please if you ever bought the product please be respectful of that and then
[01:15:00] please be respectful of that and then take take take care of that in a very
[01:15:02] take take take care of that in a very thoughtful and respectful way of users
[01:15:04] thoughtful and respectful way of users but if first if so this is the cloud if
[01:15:11] but if first if so this is the cloud if you have a lot of edge devices and all
[01:15:12] you have a lot of edge devices and all the data is processed there um you won't
[01:15:16] the data is processed there um you won't even know what your users are doing and
[01:15:18] even know what your users are doing and they're happy unhappy you just don't
[01:15:20] they're happy unhappy you just don't know
[01:15:21] know but if some of the data are in the
[01:15:22] but if some of the data are in the stream to your service at the cloud and
[01:15:24] stream to your service at the cloud and if the user privacy would really
[01:15:27] if the user privacy would really please use good music consent tell
[01:15:29] please use good music consent tell people what you doing on the data but if
[01:15:31] people what you doing on the data but if you take care of that you know in a
[01:15:34] you take care of that you know in a reasonable and sound way if you're able
[01:15:36] reasonable and sound way if you're able to examine some of the data then you can
[01:15:38] to examine some of the data then you can at least figure out that gee looks like
[01:15:41] at least figure out that gee looks like analyzing the data there are these
[01:15:43] analyzing the data there are these people this acts on this background
[01:15:44] people this acts on this background noise that is giving it back rather than
[01:15:48] noise that is giving it back rather than experience and you can also maybe have
[01:15:50] experience and you can also maybe have the data so you can gather the data from
[01:15:52] the data so you can gather the data from the edge to feedback to your model right
[01:15:55] the edge to feedback to your model right so so lets you detect that something's
[01:15:58] so so lets you detect that something's gone wrong it lets you have the data to
[01:16:01] gone wrong it lets you have the data to retrain the model to solve the British
[01:16:04] retrain the model to solve the British accent problems you can retrain the
[01:16:05] accent problems you can retrain the model for a lot
[01:16:06] model for a lot British accent to speak and then finally
[01:16:09] British accent to speak and then finally lets you push them all back home okay so
[01:16:11] lets you push them all back home okay so the first unless you detect what's going
[01:16:13] the first unless you detect what's going on - it gives you data for training then
[01:16:20] on - it gives you data for training then three unless you more easily push the
[01:16:22] three unless you more easily push the model back up push push the new model to
[01:16:25] model back up push push the new model to a production it's a deployment okay oh
[01:16:27] a production it's a deployment okay oh and this is also why even if your
[01:16:31] and this is also why even if your computation needs to run on the edge if
[01:16:33] computation needs to run on the edge if you could in a way respectful of user
[01:16:36] you could in a way respectful of user privacy in this transparent about how
[01:16:37] privacy in this transparent about how you use data if you can get even a small
[01:16:39] you use data if you can get even a small sample of data or have a few volunteer
[01:16:42] sample of data or have a few volunteer users send you some data back to the
[01:16:44] users send you some data back to the cloud that will greatly increase your
[01:16:46] cloud that will greatly increase your ability to detect there's something gone
[01:16:47] ability to detect there's something gone wrong as well maybe give you some data
[01:16:49] wrong as well maybe give you some data to retrain the model so even if you can
[01:16:51] to retrain the model so even if you can only do so that push updates right this
[01:16:54] only do so that push updates right this this will just will help greatly okay um
[01:16:57] this will just will help greatly okay um all right so finally one last comment I
[01:17:01] all right so finally one last comment I think one one one last challenge is a
[01:17:04] think one one one last challenge is a lot of machine learning systems you're
[01:17:06] lot of machine learning systems you're not done at deployment there's a
[01:17:07] not done at deployment there's a constant ongoing maintenance process and
[01:17:09] constant ongoing maintenance process and I think one of the processes you know AI
[01:17:13] I think one of the processes you know AI teams are getting better on this wall I
[01:17:14] teams are getting better on this wall I set up QA to make sure that we update
[01:17:16] set up QA to make sure that we update the model you don't break something so I
[01:17:18] the model you don't break something so I think QA and large companies Quality
[01:17:20] think QA and large companies Quality Assurance process it's called testing us
[01:17:22] Assurance process it's called testing us and I think the way you test machine
[01:17:24] and I think the way you test machine algorithms is different in the way you
[01:17:26] algorithms is different in the way you test try there's no software because the
[01:17:28] test try there's no software because the performance of machine learning
[01:17:29] performance of machine learning algorithms is often measured in a
[01:17:31] algorithms is often measured in a statistical way right so it doesn't work
[01:17:32] statistical way right so it doesn't work and it doesn't work it neither works no
[01:17:35] and it doesn't work it neither works no doesn't work instead it works you know
[01:17:37] doesn't work instead it works you know 95 percent of the time or something and
[01:17:38] 95 percent of the time or something and so lot of companies are evolving the QA
[01:17:41] so lot of companies are evolving the QA processes that this type of statistical
[01:17:43] processes that this type of statistical testing to make sure that even you
[01:17:44] testing to make sure that even you change the modern you to push update its
[01:17:47] change the modern you to push update its the works you know 95 or 99 percent of
[01:17:49] the works you know 95 or 99 percent of the time or something rather than so-so
[01:17:51] the time or something rather than so-so so putting in place new QA test
[01:17:53] so putting in place new QA test processes as well ok all right I hope
[01:17:57] processes as well ok all right I hope that was helpful stepping through what
[01:17:59] that was helpful stepping through what the full arc of a machine learning
[01:18:00] the full arc of a machine learning project will look like well well later
[01:18:03] project will look like well well later this for certain course vias was no
[01:18:05] this for certain course vias was no later lectures are present we keep
[01:18:06] later lectures are present we keep talking about machine learning strategy
[01:18:08] talking about machine learning strategy and how to make business
[01:18:10] and how to make business so let's break for today


================================================================================
LECTURE 004
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 4 - Adversarial Attacks / GANs

Source: https://www.youtube.com/watch?v=ANszao6YQuM

---

Transcript

[00:00:04] okay let's get
[00:00:06] okay let's get so welcome to lecture number four today
[00:00:11] so welcome to lecture number four today we will go over two topics that are not
[00:00:14] we will go over two topics that are not discussed in the Coursera videos you've
[00:00:17] discussed in the Coursera videos you've been learning c 2 m 1 and c 2 m 2 if I'm
[00:00:22] been learning c 2 m 1 and c 2 m 2 if I'm not mistaking so you've learnt about
[00:00:24] not mistaking so you've learnt about what an initialization is how to tune
[00:00:28] what an initialization is how to tune your own networks what tests validation
[00:00:31] your own networks what tests validation and trainsets
[00:00:32] and trainsets are today we're going to go a little
[00:00:34] are today we're going to go a little further you should have the background
[00:00:36] further you should have the background to understand 80% of this lecture
[00:00:39] to understand 80% of this lecture there's maybe 20% that I want you to
[00:00:41] there's maybe 20% that I want you to look back after you've seen the batch
[00:00:43] look back after you've seen the batch norm videos for those of you who haven't
[00:00:45] norm videos for those of you who haven't seen them
[00:00:45] seen them so we'll fit the lecture in two parts
[00:00:48] so we'll fit the lecture in two parts and I put back the attendance code at
[00:00:50] and I put back the attendance code at the at the very end of the lecture so
[00:00:52] the at the very end of the lecture so don't worry
[00:00:53] don't worry one topic is attacking neural networks
[00:00:56] one topic is attacking neural networks with adversarial examples the second one
[00:01:00] with adversarial examples the second one is generative adversarial networks and
[00:01:03] is generative adversarial networks and although these two topics have a common
[00:01:06] although these two topics have a common word which is adversarial there are two
[00:01:08] word which is adversarial there are two separate topics you will understand why
[00:01:10] separate topics you will understand why it's called adversarial in both cases so
[00:01:12] it's called adversarial in both cases so let's get started with adversarial
[00:01:15] let's get started with adversarial examples and in 2013 a Christian Zagat
[00:01:20] examples and in 2013 a Christian Zagat in his team have published a paper
[00:01:23] in his team have published a paper called intriguing properties of neural
[00:01:25] called intriguing properties of neural networks what they noticed is that
[00:01:27] networks what they noticed is that neural net was neural networks have kind
[00:01:29] neural net was neural networks have kind of a blind spots a spot for which
[00:01:33] of a blind spots a spot for which several machine learning including the
[00:01:35] several machine learning including the state-of-the-art ones that you will
[00:01:36] state-of-the-art ones that you will learn about vgg 1619 inception networks
[00:01:41] learn about vgg 1619 inception networks and residual networks are vulnerable to
[00:01:45] and residual networks are vulnerable to something called adverse early examples
[00:01:46] something called adverse early examples these adverse early examples you're
[00:01:49] these adverse early examples you're going to learn what it is in three parts
[00:01:52] going to learn what it is in three parts first by explaining how these examples
[00:01:55] first by explaining how these examples in the context of images can attack a
[00:01:57] in the context of images can attack a network in their blind spot and and make
[00:02:01] network in their blind spot and and make the network classify these images as
[00:02:03] the network classify these images as something totally wrong how to defend
[00:02:06] something totally wrong how to defend against this type of examples and why
[00:02:09] against this type of examples and why are networks
[00:02:11] are networks vulnerable to this type of examples this
[00:02:13] vulnerable to this type of examples this is a little bit more theoretical and
[00:02:14] is a little bit more theoretical and we're going to go over it on the board
[00:02:16] we're going to go over it on the board the papers that are listed on the bottom
[00:02:20] the papers that are listed on the bottom or the two big papers that started this
[00:02:22] or the two big papers that started this field of research so I would advise you
[00:02:24] field of research so I would advise you to go and and read them because we have
[00:02:27] to go and and read them because we have only one hour and a half to go over two
[00:02:29] only one hour and a half to go over two big topics in in deep learning and we
[00:02:33] big topics in in deep learning and we will not have the time to go into
[00:02:34] will not have the time to go into details of everything
[00:02:36] details of everything okay so let's set up the goal the goal
[00:02:39] okay so let's set up the goal the goal is like is that given a pre trained
[00:02:41] is like is that given a pre trained network so a network trained on imagenet
[00:02:43] network so a network trained on imagenet on a thousand classes millions of images
[00:02:46] on a thousand classes millions of images find an input image that is not an
[00:02:50] find an input image that is not an iguana
[00:02:51] iguana so doesn't look like the animal iguana
[00:02:53] so doesn't look like the animal iguana but will be classified by the network as
[00:02:56] but will be classified by the network as an iguana we will call this an
[00:02:58] an iguana we will call this an adversarial example if we manage to find
[00:03:01] adversarial example if we manage to find it okay yeah one question so 2848 89 let
[00:03:12] it okay yeah one question so 2848 89 let me write it down on the board can you
[00:03:20] me write it down on the board can you guys see okay that's more so we have a
[00:03:28] guys see okay that's more so we have a network pre trained on Imogen it's a
[00:03:30] network pre trained on Imogen it's a very good network what I want is to fool
[00:03:35] very good network what I want is to fool it by giving it an image that doesn't
[00:03:36] it by giving it an image that doesn't look like anyone a bird is classified as
[00:03:38] look like anyone a bird is classified as an iguana so if I give it a cat image to
[00:03:40] an iguana so if I give it a cat image to start with the network is obviously
[00:03:42] start with the network is obviously going to give me a vector of
[00:03:44] going to give me a vector of probabilities that has the maximum
[00:03:46] probabilities that has the maximum probability for cats because it's a good
[00:03:48] probability for cats because it's a good network and you can guess what's the
[00:03:50] network and you can guess what's the output layer of this network is probably
[00:03:52] output layer of this network is probably a soft max it's a classification network
[00:03:55] a soft max it's a classification network now what I want is to find an image x
[00:03:58] now what I want is to find an image x that is going to be classified as an
[00:04:01] that is going to be classified as an iguana by the network okay
[00:04:04] iguana by the network okay does the the setting make sense to
[00:04:07] does the the setting make sense to everyone okay now as usual this this
[00:04:12] everyone okay now as usual this this might remind you of what we've seen
[00:04:14] might remind you of what we've seen together about neural style transfer
[00:04:16] together about neural style transfer remember the art generation thing where
[00:04:18] remember the art generation thing where we wanted to generate an image based on
[00:04:21] we wanted to generate an image based on the content of a first image and the
[00:04:23] the content of a first image and the style of another image and in that
[00:04:25] style of another image and in that problem the main difference with classic
[00:04:28] problem the main difference with classic supervised learning was that we fix the
[00:04:30] supervised learning was that we fix the parameters of the network which was also
[00:04:32] parameters of the network which was also pre trained
[00:04:33] pre trained and we back propagate the error of the
[00:04:35] and we back propagate the error of the loss all the way back to the input image
[00:04:37] loss all the way back to the input image to update the pixels so that it looks
[00:04:40] to update the pixels so that it looks like the content of the contents image
[00:04:44] like the content of the contents image and the style of the style image the
[00:04:46] and the style of the style image the first thing we did is that we rephrase
[00:04:47] first thing we did is that we rephrase the problem we try to phrase what
[00:04:50] the problem we try to phrase what exactly we want so what would you say is
[00:04:53] exactly we want so what would you say is a sentence that defines our last
[00:04:56] a sentence that defines our last function let's say yes an image that
[00:05:20] function let's say yes an image that provides minimum cost ok what's the cost
[00:05:22] provides minimum cost ok what's the cost you're talking about expected iguana and
[00:05:30] you're talking about expected iguana and not expected iguana what do you mean
[00:05:31] not expected iguana what do you mean exactly by that we're trying to Train it
[00:05:39] yeah okay
[00:05:42] yeah okay so you want this image to minimize a
[00:05:45] so you want this image to minimize a certain loss function and the loss
[00:05:46] certain loss function and the loss function would be the distance metric
[00:05:48] function would be the distance metric between the output you're looking for
[00:05:50] between the output you're looking for and the output you want okay yeah so I
[00:05:54] and the output you want okay yeah so I would say we want to find X the image
[00:05:56] would say we want to find X the image that Y hat of X which is the result of
[00:05:59] that Y hat of X which is the result of the forward propagation of X in the
[00:06:00] the forward propagation of X in the network is equal to Y iguanas which is a
[00:06:04] network is equal to Y iguanas which is a one hard vector with the one at the
[00:06:06] one hard vector with the one at the position of iguanas
[00:06:07] position of iguanas does that make sense so now based on
[00:06:10] does that make sense so now based on that we define our loss function which
[00:06:12] that we define our loss function which is can be an l2 loss can be an l1 loss
[00:06:15] is can be an l2 loss can be an l1 loss can be a croissant repea in practice
[00:06:17] can be a croissant repea in practice this one works better so you see that
[00:06:22] this one works better so you see that minimizing this loss function will lead
[00:06:24] minimizing this loss function will lead our image X to be outputted as an iguana
[00:06:28] our image X to be outputted as an iguana by the network that makes sense and then
[00:06:31] by the network that makes sense and then the process is very similar to neural
[00:06:32] the process is very similar to neural side transfer where we will optimize the
[00:06:35] side transfer where we will optimize the image iteratively so we will start with
[00:06:37] image iteratively so we will start with X we will forward propagate it compute
[00:06:41] X we will forward propagate it compute the loss function that we just defined
[00:06:42] the loss function that we just defined and remember we're not training the
[00:06:45] and remember we're not training the network right which
[00:06:46] network right which take the derivative of the loss function
[00:06:48] take the derivative of the loss function all the way back to the inputs and
[00:06:50] all the way back to the inputs and update the input using a graduate
[00:06:52] update the input using a graduate descent algorithm until we get something
[00:06:56] descent algorithm until we get something that is classified as anyone yeah any
[00:07:00] that is classified as anyone yeah any question on that okay so you mentioned
[00:07:06] question on that okay so you mentioned that it doesn't warranty that X is not
[00:07:09] that it doesn't warranty that X is not going to look like something the only
[00:07:11] going to look like something the only thing is guaranteeing is that this X
[00:07:13] thing is guaranteeing is that this X will be classified as an iguana if we
[00:07:15] will be classified as an iguana if we trained properly we will talk about that
[00:07:18] trained properly we will talk about that now another question in the back I
[00:07:19] now another question in the back I thought yeah oh yeah it could be binary
[00:07:27] thought yeah oh yeah it could be binary croissant it could be croissant repiy
[00:07:28] croissant it could be croissant repiy yeah so in this case not binary
[00:07:30] yeah so in this case not binary cross-entropy because we have a vector
[00:07:33] cross-entropy because we have a vector of of n classes but it could have been
[00:07:36] of of n classes but it could have been croissant from here okay so yeah that's
[00:07:40] croissant from here okay so yeah that's true we are we Goren T that's the forged
[00:07:42] true we are we Goren T that's the forged image X this one is going to look like
[00:07:45] image X this one is going to look like an iguana who thinks it's going to look
[00:07:50] an iguana who thinks it's going to look like an iguana a few who thinks it's not
[00:07:55] like an iguana a few who thinks it's not going to look like anyone okay majority
[00:07:58] going to look like anyone okay majority of you so can someone tell me why it's
[00:08:00] of you so can someone tell me why it's not going to look like an iguana
[00:08:13] okay so you say the loss function is
[00:08:16] okay so you say the loss function is unconstrained is very unconstrained so
[00:08:18] unconstrained is very unconstrained so we didn't put any constraint on what the
[00:08:19] we didn't put any constraint on what the image should look like that's true
[00:08:21] image should look like that's true actually the answer to this question is
[00:08:22] actually the answer to this question is it depends
[00:08:23] it depends we don't know maybe it looks like an
[00:08:24] we don't know maybe it looks like an iguana maybe does it but in terms of
[00:08:26] iguana maybe does it but in terms of probabilities it's high chance that it
[00:08:28] probabilities it's high chance that it doesn't look like anyone so the reason
[00:08:30] doesn't look like anyone so the reason is here let's say this is our space of
[00:08:33] is here let's say this is our space of input images an interesting thing is
[00:08:35] input images an interesting thing is that even if as human on a daily basis
[00:08:38] that even if as human on a daily basis we deal with images of the real world so
[00:08:40] we deal with images of the real world so like I mean if you look at the TV
[00:08:43] like I mean if you look at the TV that is totally buggy you see pixels
[00:08:46] that is totally buggy you see pixels random pixels but in other contexts we
[00:08:48] random pixels but in other contexts we usually see real word distribution
[00:08:50] usually see real word distribution images our network is deterministic it
[00:08:52] images our network is deterministic it means it takes an image any input image
[00:08:55] means it takes an image any input image that fits the the first layer would
[00:08:57] that fits the the first layer would would be would produce an output right
[00:09:00] would be would produce an output right so this is the whole space of input
[00:09:03] so this is the whole space of input images that the network can see this is
[00:09:09] images that the network can see this is the space of real images it's a lot
[00:09:11] the space of real images it's a lot smaller can someone tell me what's the
[00:09:13] smaller can someone tell me what's the size of the the space of possible input
[00:09:16] size of the the space of possible input images for a network so infinite it's
[00:09:22] images for a network so infinite it's not infinite it's been a lot but okay
[00:09:28] not infinite it's been a lot but okay yeah there is an idea here someone here
[00:09:31] yeah there is an idea here someone here is number impossible pixel permutations
[00:09:35] is number impossible pixel permutations yeah that's true
[00:09:37] yeah that's true so more precisely you would start with
[00:09:39] so more precisely you would start with how many pixel values are there there
[00:09:43] how many pixel values are there there are 255 256 pixel values and then what's
[00:09:47] are 255 256 pixel values and then what's the size of an image let's say 64 by 64
[00:09:50] the size of an image let's say 64 by 64 by 3 and your results would give you 256
[00:09:53] by 3 and your results would give you 256 so you fix the first pixel 256 possible
[00:09:57] so you fix the first pixel 256 possible value then the second one can be
[00:09:59] value then the second one can be anything else then the third one can be
[00:10:01] anything else then the third one can be anything else
[00:10:01] anything else and you end up with a very big number so
[00:10:03] and you end up with a very big number so this is a huge number and the space of
[00:10:06] this is a huge number and the space of real images is here now if we had to
[00:10:08] real images is here now if we had to plot the space of of images classified
[00:10:11] plot the space of of images classified as an iguana it would be something like
[00:10:13] as an iguana it would be something like that right and you see that there is a
[00:10:15] that right and you see that there is a small overlap between the space of real
[00:10:18] small overlap between the space of real images and the space of him
[00:10:20] images and the space of him is classified by as an iguana by the
[00:10:21] is classified by as an iguana by the network and this is where we probably
[00:10:25] network and this is where we probably are not we're probably in the green part
[00:10:28] are not we're probably in the green part that is not overlapping with the red
[00:10:29] that is not overlapping with the red part because we didn't constrain our
[00:10:31] part because we didn't constrain our optimization problem does that make
[00:10:33] optimization problem does that make sense okay now we're going to constrain
[00:10:36] sense okay now we're going to constrain it a little bit more because in practice
[00:10:39] it a little bit more because in practice this type of attacks are not too
[00:10:42] this type of attacks are not too dangerous because as a human we would
[00:10:44] dangerous because as a human we would see that the pictures look like garbage
[00:10:46] see that the pictures look like garbage the dangerous attack is if the picture
[00:10:49] the dangerous attack is if the picture looks like a cat but the network sees it
[00:10:52] looks like a cat but the network sees it as an iguana and humans see it as a cat
[00:10:54] as an iguana and humans see it as a cat can someone think of of like malicious
[00:10:58] can someone think of of like malicious applications of that face recognitions
[00:11:03] applications of that face recognitions you could show a face you could show
[00:11:07] you could show a face you could show your your picture of your face I'll push
[00:11:09] your your picture of your face I'll push the network to think it's a face of
[00:11:10] the network to think it's a face of someone else what else yeah breaking
[00:11:23] someone else what else yeah breaking CAPTCHAs if you know what the output
[00:11:25] CAPTCHAs if you know what the output what output you want you can force the
[00:11:27] what output you want you can force the network to think that this CAPTCHA did
[00:11:29] network to think that this CAPTCHA did this input CAPTCHA is the output it's
[00:11:31] this input CAPTCHA is the output it's looking for or in general I would say
[00:11:34] looking for or in general I would say like social medias if someone is
[00:11:38] like social medias if someone is malicious and wants to put violent
[00:11:40] malicious and wants to put violent content online there is all these
[00:11:43] content online there is all these companies have algorithms to check for
[00:11:44] companies have algorithms to check for this violent content if people can use
[00:11:47] this violent content if people can use adverse you're examples that look still
[00:11:49] adverse you're examples that look still violent but are not detected as violent
[00:11:51] violent but are not detected as violent by the algorithms using this methodology
[00:11:53] by the algorithms using this methodology they could still publish their violent
[00:11:55] they could still publish their violent pictures think about self-driving cars a
[00:11:58] pictures think about self-driving cars a stop sign that looks like a stop sign
[00:11:59] stop sign that looks like a stop sign for everyone but when the self-driving
[00:12:01] for everyone but when the self-driving car sees it it's not a stop sign so
[00:12:05] car sees it it's not a stop sign so these are malicious applications of
[00:12:06] these are malicious applications of adversity examples and they're a lot
[00:12:08] adversity examples and they're a lot more okay and in fact the picture we
[00:12:11] more okay and in fact the picture we generated previously would look like
[00:12:13] generated previously would look like that it's nothing special
[00:12:15] that it's nothing special so now let's constrain our problem a
[00:12:17] so now let's constrain our problem a little bit more we're going to say we
[00:12:19] little bit more we're going to say we want the picture to look like a cat but
[00:12:22] want the picture to look like a cat but be classified as an iguana okay so now
[00:12:27] be classified as an iguana okay so now same we have our neural network if we
[00:12:30] same we have our neural network if we give it a cat is going to predict that
[00:12:31] give it a cat is going to predict that it's a cat what we want is
[00:12:33] it's a cat what we want is she'll give it a cut but predict that
[00:12:35] she'll give it a cut but predict that it's only one okay III go quickly over
[00:12:41] it's only one okay III go quickly over that because it's very similar to what
[00:12:43] that because it's very similar to what we did before so I just plot I just put
[00:12:45] we did before so I just plot I just put back what we had on the previous slide
[00:12:47] back what we had on the previous slide okay exactly the same thing now the way
[00:12:51] okay exactly the same thing now the way we phrase our problem will be a little
[00:12:52] we phrase our problem will be a little different instead of saying we want only
[00:12:55] different instead of saying we want only y hat of x equals y y now we have
[00:12:58] y hat of x equals y y now we have another constraint what's the other
[00:12:59] another constraint what's the other constraint the picture X should be
[00:13:11] constraint the picture X should be closer to the picture of the cat so we
[00:13:12] closer to the picture of the cat so we want X equal or very close to X cat and
[00:13:16] want X equal or very close to X cat and in terms of loss function what it does
[00:13:18] in terms of loss function what it does is that it adds another term which is
[00:13:23] is that it adds another term which is going to decide how X should be close to
[00:13:25] going to decide how X should be close to X cat if we minimize this loss now we
[00:13:27] X cat if we minimize this loss now we should have an image that looks like a
[00:13:30] should have an image that looks like a cat because of the second term and that
[00:13:31] cat because of the second term and that is predicted as an iguana because of the
[00:13:34] is predicted as an iguana because of the first term does that make sense
[00:13:36] first term does that make sense so we're just building up our loss
[00:13:38] so we're just building up our loss functions and I guess you guys are very
[00:13:39] functions and I guess you guys are very familiar with this type of thought
[00:13:41] familiar with this type of thought process now okay an same process we
[00:13:44] process now okay an same process we optimized until we hopefully get a cat
[00:13:47] optimized until we hopefully get a cat now a question is what should be the
[00:13:52] now a question is what should be the initial image we start with we didn't
[00:13:57] initial image we start with we didn't talk about that in the previous example
[00:14:03] yeah
[00:14:04] yeah white noise well yeah possibly white
[00:14:07] white noise well yeah possibly white noise any other a cat
[00:14:12] noise any other a cat yeah which cat
[00:14:18] I don't know probably the cat that we
[00:14:22] I don't know probably the cat that we put in the last function right because
[00:14:24] put in the last function right because it's the closest one to what we want to
[00:14:26] it's the closest one to what we want to get so if we want to have a fast process
[00:14:28] get so if we want to have a fast process we'd better start with exactly this cat
[00:14:30] we'd better start with exactly this cat which is the one we put in our last
[00:14:32] which is the one we put in our last function here right if we put another
[00:14:36] function here right if we put another cat is going to be a little longer
[00:14:37] cat is going to be a little longer because we have to change the pixel of
[00:14:39] because we have to change the pixel of the other cat to look like this cat
[00:14:40] the other cat to look like this cat that's what we told our last function if
[00:14:43] that's what we told our last function if we start with white noise it will take
[00:14:44] we start with white noise it will take even longer because we have to change
[00:14:46] even longer because we have to change the pixels all the way so that it looks
[00:14:48] the pixels all the way so that it looks real and then it looks like a cat that
[00:14:49] real and then it looks like a cat that we defined here so yeah the best thing
[00:14:51] we defined here so yeah the best thing would be probably to start with the
[00:14:53] would be probably to start with the picture of the cat does that make sense
[00:14:56] picture of the cat does that make sense and then move the pixels so that this
[00:14:58] and then move the pixels so that this term is also minimized yeah yeah this is
[00:15:22] term is also minimized yeah yeah this is this is empirical the fact that we use
[00:15:24] this is empirical the fact that we use that type of loss function but in
[00:15:27] that type of loss function but in practice it could have been any distance
[00:15:28] practice it could have been any distance between X and X cat and any distance
[00:15:31] between X and X cat and any distance between you I hurt my cat yeah and why
[00:15:33] between you I hurt my cat yeah and why you go on Oscar yes
[00:15:45] exactly it's a bunch of cats I'm not
[00:16:09] exactly it's a bunch of cats I'm not sure about the second method but just to
[00:16:10] sure about the second method but just to repeat the point you mentioned is that
[00:16:12] repeat the point you mentioned is that here we had to choose a cat
[00:16:14] here we had to choose a cat it means the X cat is actually an image
[00:16:17] it means the X cat is actually an image of a cat so what if we don't know what
[00:16:20] of a cat so what if we don't know what the cat should look like we just want a
[00:16:22] the cat should look like we just want a random cat to come out and be classified
[00:16:25] random cat to come out and be classified as an iguana we're going to see a
[00:16:27] as an iguana we're going to see a generative networks after which can be
[00:16:29] generative networks after which can be used to do that type of stuff but but
[00:16:32] used to do that type of stuff but but for the second part of the question I'm
[00:16:34] for the second part of the question I'm not sure what the optimization process
[00:16:35] not sure what the optimization process would look like okay
[00:16:39] would look like okay let's move on so yeah it's probably a
[00:16:43] let's move on so yeah it's probably a good idea to start with the cat image
[00:16:44] good idea to start with the cat image that we specified in the last function
[00:16:48] that we specified in the last function okay and so then we have an image of a
[00:16:51] okay and so then we have an image of a cat that originally was classified as
[00:16:53] cat that originally was classified as 92% cat and we modified a few pixels so
[00:16:56] 92% cat and we modified a few pixels so you can see that this image looks a
[00:16:58] you can see that this image looks a little blurry
[00:16:59] little blurry so by doing this modification the
[00:17:02] so by doing this modification the network will think it's an iguana okay
[00:17:05] network will think it's an iguana okay and sometimes this modification can be
[00:17:07] and sometimes this modification can be very slight and we can even not be able
[00:17:09] very slight and we can even not be able to notice it sounds good
[00:17:12] to notice it sounds good now let's add something else to this to
[00:17:18] now let's add something else to this to this to this draft we add a third set
[00:17:21] this to this draft we add a third set which is the space of images that look
[00:17:23] which is the space of images that look real to human so that's interesting
[00:17:26] real to human so that's interesting because the space of images that look
[00:17:28] because the space of images that look real to human is actually bigger the
[00:17:30] real to human is actually bigger the space than the space of real images an
[00:17:33] space than the space of real images an example is this one this is probably an
[00:17:36] example is this one this is probably an image that looks real to human but it's
[00:17:38] image that looks real to human but it's not an image that we could seen in the
[00:17:39] not an image that we could seen in the daily life because of this slight pixel
[00:17:41] daily life because of this slight pixel changes okay so these are the space of
[00:17:45] changes okay so these are the space of dangerous are there examples they look
[00:17:47] dangerous are there examples they look real to human but they're not actually
[00:17:49] real to human but they're not actually real they might be used to fool model
[00:17:52] real they might be used to fool model okay
[00:17:56] now let's see a video by cracking at all
[00:18:00] now let's see a video by cracking at all on real world example of adversity all
[00:18:04] on real world example of adversity all examples so for those who cannot read
[00:18:07] examples so for those who cannot read they're taking a camera which which
[00:18:11] they're taking a camera which which classify which has a classifier and the
[00:18:14] classify which has a classifier and the classifier classifies the first part as
[00:18:16] classifier classifies the first part as the library and the second image that is
[00:18:18] the library and the second image that is that the same as a prison
[00:18:21] that the same as a prison so the second image has slight different
[00:18:24] so the second image has slight different pixels but it's hard to see for him same
[00:18:26] pixels but it's hard to see for him same here
[00:18:27] here so the the classifier on the phone
[00:18:30] so the the classifier on the phone classifies the first image as a washer
[00:18:36] classifies the first image as a washer with fifty-two percent accuracy
[00:18:38] with fifty-two percent accuracy confidence and the second one as a
[00:18:41] confidence and the second one as a doormat so this is a small example of
[00:18:48] doormat so this is a small example of what can what can be done okay now let's
[00:18:52] what can what can be done okay now let's go we've seen how to generate these
[00:18:54] go we've seen how to generate these adverse real examples it's an
[00:18:55] adverse real examples it's an optimization process we will see what
[00:18:59] optimization process we will see what are the type of attacks that we can lead
[00:19:01] are the type of attacks that we can lead and what are defenses against these
[00:19:03] and what are defenses against these adverse all examples so we would usually
[00:19:06] adverse all examples so we would usually split the attacks into two parts non
[00:19:09] split the attacks into two parts non targeted attacks and targeted attacks so
[00:19:14] targeted attacks and targeted attacks so non targeted attacks means that we just
[00:19:17] non targeted attacks means that we just want output we just want to find an
[00:19:19] want output we just want to find an adversary example that is going to fool
[00:19:21] adversary example that is going to fool the model while targeted attack is we
[00:19:23] the model while targeted attack is we want to force this ad versatile example
[00:19:25] want to force this ad versatile example to be output to output a specific class
[00:19:28] to be output to output a specific class that we chose these are two different
[00:19:30] that we chose these are two different type of attacks that that are widely
[00:19:32] type of attacks that that are widely discussed in in the research knowledge
[00:19:36] discussed in in the research knowledge of the attacker is something very
[00:19:37] of the attacker is something very important for those of you who did some
[00:19:38] important for those of you who did some crypto you know that we talk about white
[00:19:41] crypto you know that we talk about white box attacks black box attacks so one
[00:19:44] box attacks black box attacks so one interesting thing is that a black box
[00:19:47] interesting thing is that a black box attack a white box attack is when you
[00:19:49] attack a white box attack is when you have access to a network so we have our
[00:19:50] have access to a network so we have our image and pretend free train network we
[00:19:52] image and pretend free train network we have fully access to to all the
[00:19:55] have fully access to to all the parameters and the gradients so it's
[00:19:58] parameters and the gradients so it's probably an easier attack right we can
[00:20:01] probably an easier attack right we can we can back propagate all the way back
[00:20:03] we can back propagate all the way back to the image and update the image like
[00:20:05] to the image and update the image like with it
[00:20:07] with it box attack is when the model is probably
[00:20:09] box attack is when the model is probably encrypted or something like that so that
[00:20:11] encrypted or something like that so that we don't have access to its parameters
[00:20:12] we don't have access to its parameters activations and architecture so the
[00:20:16] activations and architecture so the question is how do we attack in blackbox
[00:20:19] question is how do we attack in blackbox attack if we cannot back propagates
[00:20:21] attack if we cannot back propagates because we don't have access to the
[00:20:23] because we don't have access to the layers any ideas yeah numerical grade
[00:20:29] layers any ideas yeah numerical grade yeah good idea
[00:20:30] yeah good idea so you know you will trick the image a
[00:20:32] so you know you will trick the image a little bit and you will see how it
[00:20:33] little bit and you will see how it changes the laws looking at this you can
[00:20:36] changes the laws looking at this you can you can do have an estimate of the
[00:20:39] you can do have an estimate of the numerical gradient even if the model is
[00:20:41] numerical gradient even if the model is a black box model this assumes that you
[00:20:44] a black box model this assumes that you can query the model right you can query
[00:20:46] can query the model right you can query it what if you cannot even query the
[00:20:48] it what if you cannot even query the model or you can query it one time only
[00:20:50] model or you can query it one time only it's to send you add virtual example how
[00:20:53] it's to send you add virtual example how would you do that so this becomes more
[00:20:56] would you do that so this becomes more complicated
[00:21:07] so there is a very complex property of
[00:21:11] so there is a very complex property of this address your example is is that
[00:21:13] this address your example is is that they're highly transferable it means I
[00:21:16] they're highly transferable it means I have a model here that is a nanny Moll
[00:21:20] have a model here that is a nanny Moll classifier okay I don't have access to
[00:21:24] classifier okay I don't have access to it I cannot even query it I still want
[00:21:27] it I cannot even query it I still want to fool it what I'm going to do is that
[00:21:29] to fool it what I'm going to do is that I'm going to build my own animal
[00:21:31] I'm going to build my own animal classifier forge an adversarial example
[00:21:33] classifier forge an adversarial example on it it's highly likely that it's going
[00:21:36] on it it's highly likely that it's going to be an adverse example for the other
[00:21:38] to be an adverse example for the other one as well so this is called
[00:21:39] one as well so this is called transferability and it's still a
[00:21:41] transferability and it's still a research topic okay we're trying to
[00:21:43] research topic okay we're trying to understand why this happens and also how
[00:21:48] understand why this happens and also how to defend against that you know maybe
[00:21:50] to defend against that you know maybe your defense against that is - is -
[00:21:53] your defense against that is - is - we're going to see it after I'm not
[00:21:55] we're going to see it after I'm not going to say enough so does that make
[00:21:57] going to say enough so does that make sense or no this transferability
[00:21:58] sense or no this transferability probably it's because - animal
[00:22:01] probably it's because - animal classifiers look at the same features in
[00:22:03] classifiers look at the same features in images right and maybe these pixels that
[00:22:06] images right and maybe these pixels that are playing we're playing with or
[00:22:08] are playing we're playing with or changing also the output of the other
[00:22:09] changing also the output of the other network let's go over some kind of
[00:22:13] network let's go over some kind of defenses so one solution to defend
[00:22:17] defenses so one solution to defend against these adversely networks is to
[00:22:19] against these adversely networks is to create a safe safety net a safety net is
[00:22:22] create a safe safety net a safety net is what is a net that like a firewall you
[00:22:25] what is a net that like a firewall you will put it before your network every
[00:22:28] will put it before your network every image that comes in will be classified
[00:22:29] image that comes in will be classified as fake like forged or real by the
[00:22:34] as fake like forged or real by the network and you only take those which
[00:22:37] network and you only take those which are real and not not adversarial does
[00:22:40] are real and not not adversarial does that make sense so you could you could
[00:22:44] that make sense so you could you could you could say that okay but we can also
[00:22:45] you could say that okay but we can also build an adversarial Network that that
[00:22:48] build an adversarial Network that that fools this network right just we beat
[00:22:51] fools this network right just we beat black box or white box we can just
[00:22:52] black box or white box we can just create an adversary at example for this
[00:22:54] create an adversary at example for this network it's true but the issue is that
[00:22:56] network it's true but the issue is that now we have two constraints we have to
[00:22:59] now we have two constraints we have to fool the first one and the second one at
[00:23:00] fool the first one and the second one at the same time you know maybe if you fool
[00:23:03] the same time you know maybe if you fool the first one there is a chance that the
[00:23:05] the first one there is a chance that the second one is going to be fooled we
[00:23:07] second one is going to be fooled we don't know okay
[00:23:09] don't know okay it just makes it more complex there is
[00:23:11] it just makes it more complex there is no good defense at this point - -
[00:23:13] no good defense at this point - - to all type of adversarial examples this
[00:23:15] to all type of adversarial examples this is an option that people are researching
[00:23:16] is an option that people are researching for so the paper is here if you want to
[00:23:19] for so the paper is here if you want to check it out can you guys think of
[00:23:21] check it out can you guys think of another solution trained on multiple
[00:23:40] another solution trained on multiple loss functions through different
[00:23:41] loss functions through different networks so you're talking about an
[00:23:44] networks so you're talking about an assembly maybe we can maybe we can
[00:23:47] assembly maybe we can maybe we can create five networks to do our tasks and
[00:23:50] create five networks to do our tasks and it's highly unlikely that the address on
[00:23:52] it's highly unlikely that the address on that example is going to fool the file
[00:23:55] that example is going to fool the file networks the same way right
[00:23:58] networks the same way right any other ideas generates adversarial
[00:24:07] any other ideas generates adversarial examples and trained on those okay so
[00:24:11] examples and trained on those okay so you will generate a cat image that is
[00:24:13] you will generate a cat image that is adversarial so some pixels have been
[00:24:15] adversarial so some pixels have been changed to full a network you will label
[00:24:17] changed to full a network you will label it as the human sees it so as a cat
[00:24:20] it as the human sees it so as a cat because you want the network to still
[00:24:22] because you want the network to still see that as a cat and you will train on
[00:24:24] see that as a cat and you will train on those the downside of that is that it's
[00:24:26] those the downside of that is that it's very costly we've seen that generating
[00:24:28] very costly we've seen that generating adversarial examples is super custom and
[00:24:31] adversarial examples is super custom and also we don't know if we can generalize
[00:24:34] also we don't know if we can generalize to other adversary examples maybe we're
[00:24:36] to other adversary examples maybe we're going to overfit to the ones we have so
[00:24:38] going to overfit to the ones we have so it's another optimization problem now
[00:24:40] it's another optimization problem now another solution is to train on add
[00:24:45] another solution is to train on add virtual examples at the same time as we
[00:24:47] virtual examples at the same time as we train on on normal examples so look at
[00:24:50] train on on normal examples so look at this loss function this loss function
[00:24:52] this loss function this loss function the loss new is a sum of two loss
[00:24:54] the loss new is a sum of two loss functions one is the classic loss
[00:24:56] functions one is the classic loss function we would use so let's say
[00:24:57] function we would use so let's say croissant repiy in the case of
[00:24:59] croissant repiy in the case of classification and the second one is the
[00:25:03] classification and the second one is the same loss function but we give it the
[00:25:05] same loss function but we give it the adversary genomics so what's the
[00:25:09] adversary genomics so what's the complexity of that at a very gradient
[00:25:11] complexity of that at a very gradient descent step
[00:25:18] [Applause]
[00:25:22] for every iteration of our gradient
[00:25:24] for every iteration of our gradient descent we're going to have to iterate
[00:25:26] descent we're going to have to iterate enough to forge an adversarial example
[00:25:29] enough to forge an adversarial example at every step right because we have X
[00:25:32] at every step right because we have X what we want to do is forward propagate
[00:25:34] what we want to do is forward propagate X to the network to compute the first
[00:25:36] X to the network to compute the first term generate X adversarial with the
[00:25:39] term generate X adversarial with the optimization process and forward
[00:25:41] optimization process and forward propagate it to calculate the second
[00:25:43] propagate it to calculate the second term and then back propagate over the
[00:25:44] term and then back propagate over the weights of the network these super
[00:25:47] weights of the network these super costly as well and it's very similar to
[00:25:48] costly as well and it's very similar to what you said is just online just all
[00:25:50] what you said is just online just all the time
[00:25:51] the time ok so what is interesting is we're going
[00:25:56] ok so what is interesting is we're going to delve a little more there's another
[00:25:58] to delve a little more there's another technique called logic pairing I just
[00:26:00] technique called logic pairing I just put it here we're not going to talk
[00:26:01] put it here we're not going to talk about it there's a paper here if you
[00:26:02] about it there's a paper here if you want to check it it's another way to do
[00:26:04] want to check it it's another way to do adversarial training but what I would
[00:26:07] adversarial training but what I would like to talk about is more from a
[00:26:08] like to talk about is more from a theoretical perspective why our neural
[00:26:10] theoretical perspective why our neural network vulnerable to adversarial
[00:26:13] network vulnerable to adversarial examples so let's let's do some some
[00:26:16] examples so let's let's do some some work on the board yeah the noise thing
[00:26:39] work on the board yeah the noise thing is also nice but you you so the thing is
[00:26:42] is also nice but you you so the thing is that it's just like in crypto every time
[00:26:44] that it's just like in crypto every time you come up with a defense someone will
[00:26:46] you come up with a defense someone will come up with an attack and it's a race
[00:26:48] come up with an attack and it's a race between humans you know so this is the
[00:26:50] between humans you know so this is the same type of problem security problems
[00:26:52] same type of problem security problems are ok so let's go over something
[00:26:57] are ok so let's go over something interesting that is more on the in on
[00:27:00] interesting that is more on the in on the intuition side of adverse early
[00:27:02] the intuition side of adverse early examples so let me let me write down
[00:27:04] examples so let me let me write down something so one question we asked
[00:27:08] something so one question we asked ourselves is why do adversity an example
[00:27:10] ourselves is why do adversity an example exists
[00:27:11] exists what's the reason and young good fellow
[00:27:14] what's the reason and young good fellow and and and his team have came up with
[00:27:17] and and and his team have came up with explaining with the the one of the
[00:27:19] explaining with the the one of the seminal papers of adversity examples
[00:27:21] seminal papers of adversity examples where they argue that although many
[00:27:24] where they argue that although many people in the past have have attributed
[00:27:27] people in the past have have attributed these existence of adversity examples to
[00:27:30] these existence of adversity examples to high nonlinear non-linearity zuv neural
[00:27:32] high nonlinear non-linearity zuv neural networks and overfitting
[00:27:34] networks and overfitting so because we over
[00:27:35] so because we over it to a specific data set we actually
[00:27:37] it to a specific data set we actually don't understand what cats are we just
[00:27:40] don't understand what cats are we just understanding what what we've been
[00:27:42] understanding what what we've been trained on they argue that it's actually
[00:27:45] trained on they argue that it's actually the linear parts of networks that is the
[00:27:47] the linear parts of networks that is the cause of the existence of adversary
[00:27:49] cause of the existence of adversary examples so let's see why and the
[00:27:52] examples so let's see why and the example I'm going to I'm going to look
[00:27:53] example I'm going to I'm going to look at is linear regression so together with
[00:27:58] at is linear regression so together with similar gistic regression linear
[00:28:00] similar gistic regression linear regression is basically the same thing
[00:28:02] regression is basically the same thing without the sigmoid so before the
[00:28:03] without the sigmoid so before the sigmoid we have Y hat equals W X plus B
[00:28:06] sigmoid we have Y hat equals W X plus B so the for propagation of our network is
[00:28:12] so the for propagation of our network is going to be Y hat equals W X plus B okay
[00:28:18] going to be Y hat equals W X plus B okay and our first example is going to be a
[00:28:22] and our first example is going to be a six dimensional input okay we have a
[00:28:33] six dimensional input okay we have a neuron here but the neuron doesn't have
[00:28:36] neuron here but the neuron doesn't have any activation because we're in linear
[00:28:38] any activation because we're in linear regression so here what happens is
[00:28:40] regression so here what happens is simply w8 plus B okay and then we get Y
[00:28:46] simply w8 plus B okay and then we get Y hats and we probably use an l1 or l2
[00:28:50] hats and we probably use an l1 or l2 loss because it's a regression problem
[00:28:52] loss because it's a regression problem to to train this network now let's look
[00:28:59] to to train this network now let's look at the first example a first example
[00:29:02] at the first example a first example where where X where we strained our
[00:29:06] where where X where we strained our network so network has been trained so
[00:29:12] network so network has been trained so network has been trained and converged
[00:29:24] to W equals 1/3 minus 1 2 to 3 this is w
[00:29:38] to W equals 1/3 minus 1 2 to 3 this is w and you know like because we defined X
[00:29:41] and you know like because we defined X to be a vector of size is a column
[00:29:45] to be a vector of size is a column vector W has to be a row vector of size
[00:29:48] vector W has to be a row vector of size 6
[00:29:50] 6 so the network converts to this value of
[00:29:53] so the network converts to this value of W and B equals zero so now we're going
[00:29:58] W and B equals zero so now we're going to look at this input we're giving a new
[00:30:00] to look at this input we're giving a new input to the network and then the input
[00:30:05] input to the network and then the input is going to be 1 minus 1 to 0 3 minus 2
[00:30:12] is going to be 1 minus 1 to 0 3 minus 2 ok so I'm going to 4 propagate this to
[00:30:17] ok so I'm going to 4 propagate this to get Y hat equals W X plus B and this
[00:30:25] get Y hat equals W X plus B and this value is going to be 1 times 1 minus 3
[00:30:29] value is going to be 1 times 1 minus 3 minus 2 plus 0 plus 6 minus 6 if I
[00:30:39] minus 2 plus 0 plus 6 minus 6 if I didn't make a mistake
[00:30:40] didn't make a mistake up up 2 minus 3
[00:30:46] up up 2 minus 3 okay and so we we basically get minus 4
[00:30:54] ok so this is the the first the first
[00:30:59] ok so this is the the first the first example that was propagated now the
[00:31:05] example that was propagated now the question is how to change X into X star
[00:31:22] such that Y hat changes radically but X
[00:31:39] such that Y hat changes radically but X star is close to X so this is basically
[00:31:45] star is close to X so this is basically our problem but we're still examples can
[00:31:47] our problem but we're still examples can we find an example that is very close to
[00:31:49] we find an example that is very close to X but radically radically changes the
[00:31:52] X but radically radically changes the output of our network and we're trying
[00:31:56] output of our network and we're trying to build intuition on adversarial
[00:31:58] to build intuition on adversarial networks
[00:31:58] networks so the interesting part is to is to
[00:32:02] so the interesting part is to is to identify how we should modify X and the
[00:32:08] identify how we should modify X and the intuition comes from the derivative if
[00:32:10] intuition comes from the derivative if you take the derivative of Y hat with
[00:32:14] you take the derivative of Y hat with respect to X you know that the
[00:32:17] respect to X you know that the definition of this term is is like
[00:32:22] definition of this term is is like correlated to the impact on Y hat of
[00:32:27] correlated to the impact on Y hat of small changes of X right how what's the
[00:32:36] small changes of X right how what's the impact of small changes of X to on the
[00:32:39] impact of small changes of X to on the output and if you compute it what do you
[00:32:46] output and if you compute it what do you get W everybody agrees what's the shape
[00:32:56] get W everybody agrees what's the shape of this thing shape of that is the same
[00:33:02] of this thing shape of that is the same as shape of X so it should be W
[00:33:07] as shape of X so it should be W transpose remember derivative of a
[00:33:11] transpose remember derivative of a scalar with respect to a vector is the
[00:33:14] scalar with respect to a vector is the shape of the vector okay now it's
[00:33:18] shape of the vector okay now it's interesting to to see this because if we
[00:33:21] interesting to to see this because if we compute X star to be let's say X plus a
[00:33:27] compute X star to be let's say X plus a small perturbation like I will call it
[00:33:31] small perturbation like I will call it perturbation value yeah sorry and can
[00:33:40] perturbation value yeah sorry and can you see the top one listen yes or no
[00:33:48] so what if X star equals x plus epsilon
[00:33:52] so what if X star equals x plus epsilon time w transpose you know and this
[00:33:55] time w transpose you know and this epsilon I will call it value of the
[00:33:58] epsilon I will call it value of the perturbation now if we for propagate X
[00:34:04] perturbation now if we for propagate X star it means we do y hat star equals W
[00:34:11] star it means we do y hat star equals W X star plus B but B is 0 at this point
[00:34:16] X star plus B but B is 0 at this point we're going to get W X plus epsilon W
[00:34:23] we're going to get W X plus epsilon W times W transpose and W times W
[00:34:31] times W transpose and W times W transpose is a dot product right so this
[00:34:36] transpose is a dot product right so this is the same as W squared so what is
[00:34:45] is the same as W squared so what is interesting it's interesting because the
[00:34:49] interesting it's interesting because the smart part was that this term is always
[00:34:51] smart part was that this term is always going to be positive it means we removed
[00:34:55] going to be positive it means we removed a little bit X because we can make this
[00:34:57] a little bit X because we can make this change little by changing epsilon to a
[00:34:59] change little by changing epsilon to a small value but it's going to push Y hat
[00:35:03] small value but it's going to push Y hat to a larger value for sure you know and
[00:35:07] to a larger value for sure you know and if I had a minus here instead of a plus
[00:35:09] if I had a minus here instead of a plus it would push Y hat to a smaller value
[00:35:11] it would push Y hat to a smaller value and the interesting thing is now if we
[00:35:15] and the interesting thing is now if we compute X star to be X plus epsilon
[00:35:22] compute X star to be X plus epsilon times W transpose and we take epsilon to
[00:35:26] times W transpose and we take epsilon to be a small value like let's say point
[00:35:28] be a small value like let's say point two you can make the calculation what we
[00:35:35] two you can make the calculation what we get is is this so 1 minus 1/2 0 3 minus
[00:35:43] get is is this so 1 minus 1/2 0 3 minus 2 plus 0.2 times 1 0.2 times 3 minus 0.2
[00:35:53] 2 plus 0.2 times 1 0.2 times 3 minus 0.2 plus 0.4 plus 0.4 and
[00:35:59] plus 0.4 plus 0.4 and zero point six so if you look at that
[00:36:06] zero point six so if you look at that all the positive values have been pushed
[00:36:09] all the positive values have been pushed on the right degree and all the negative
[00:36:16] on the right degree and all the negative values sorry sorry no that's my way
[00:36:19] values sorry sorry no that's my way no that's sorry so let's finish the
[00:36:21] no that's sorry so let's finish the calculation and I'll give the inside
[00:36:22] calculation and I'll give the inside after one point to a minus zero point
[00:36:26] after one point to a minus zero point four one point eight zero point four
[00:36:32] four one point eight zero point four three point four and minus one point
[00:36:35] three point four and minus one point four so this is our X star that we hope
[00:36:38] four so this is our X star that we hope to be adversarial
[00:36:39] to be adversarial okay let's compute y hat star to see
[00:36:43] okay let's compute y hat star to see what happens it's W X star plus B which
[00:36:48] what happens it's W X star plus B which is zero so what we get when we multiply
[00:36:50] is zero so what we get when we multiply W by X star is 1.2 1.2 minus one point
[00:37:10] W by X star is 1.2 1.2 minus one point two minus one point eight plus zero
[00:37:15] two minus one point eight plus zero point eight plus six point eight and
[00:37:21] point eight plus six point eight and minus 4.2 which I believe is going to
[00:37:32] minus 4.2 which I believe is going to give us zero point five okay so we see
[00:37:40] give us zero point five okay so we see that a very slight change in X star has
[00:37:44] that a very slight change in X star has pushed Y hat from minus 4 to point 5 and
[00:37:49] pushed Y hat from minus 4 to point 5 and so a few things we want to notice here
[00:37:59] so insights on this on this small
[00:38:01] so insights on this on this small example the first one is that if W is
[00:38:09] example the first one is that if W is large then X star is not similar to X
[00:38:22] large then X star is not similar to X right the larger the W the less X star
[00:38:27] right the larger the W the less X star is is likely to be like X and
[00:38:28] is is likely to be like X and specifically if one entry of the a value
[00:38:31] specifically if one entry of the a value is very large X I the pixel
[00:38:35] is very large X I the pixel corresponding to this entry is going to
[00:38:36] corresponding to this entry is going to be very different from X I star if W is
[00:38:42] be very different from X I star if W is large X star is going to be different
[00:38:44] large X star is going to be different than X so what we're going to do is that
[00:38:47] than X so what we're going to do is that we're going to take sine sine of W
[00:38:55] we're going to take sine sine of W instead of taking W what's the reason
[00:38:58] instead of taking W what's the reason why we do that because the interesting
[00:38:59] why we do that because the interesting part is the sine of of the W it means if
[00:39:03] part is the sine of of the W it means if we play correctly with the sign of W we
[00:39:07] we play correctly with the sign of W we will always push the X this term W X
[00:39:12] will always push the X this term W X star in the positive side because every
[00:39:16] star in the positive side because every entry here this multiplication is going
[00:39:18] entry here this multiplication is going to give us a positive number right and
[00:39:23] the second insight is that as X grows in
[00:39:31] the second insight is that as X grows in dimension the impact of plus Epsilon
[00:39:44] dimension the impact of plus Epsilon sign of W increases that makes sense
[00:40:00] so the impact of sign of W on white hats
[00:40:05] so the impact of sign of W on white hats increases and so what's interesting to
[00:40:10] increases and so what's interesting to notice is that we can keep epsilon as
[00:40:13] notice is that we can keep epsilon as small as possible it means X and X star
[00:40:15] small as possible it means X and X star will be very similar but as we grow in
[00:40:18] will be very similar but as we grow in dimension we're going to get more term
[00:40:20] dimension we're going to get more term in this a lot more term and the change
[00:40:23] in this a lot more term and the change in Y hat is going to grow and grow and
[00:40:25] in Y hat is going to grow and grow and grow and grow and grow and so the one
[00:40:27] grow and grow and grow and so the one reason why adversity all examples exist
[00:40:29] reason why adversity all examples exist for images is because the dimension is
[00:40:32] for images is because the dimension is very high 64 by 64 by 3 so we can make
[00:40:36] very high 64 by 64 by 3 so we can make epsilon very small and take the sign of
[00:40:39] epsilon very small and take the sign of W we will still get Y hat to be far from
[00:40:44] W we will still get Y hat to be far from the original value that it had does it
[00:40:46] the original value that it had does it make sense do you guys have any question
[00:40:49] make sense do you guys have any question on that so epsilon doesn't grow with the
[00:40:53] on that so epsilon doesn't grow with the dimension but its impact of this term
[00:40:56] dimension but its impact of this term increases with the dimension okay to a
[00:41:27] increases with the dimension okay to a cooler I think who's watching - I know -
[00:41:30] cooler I think who's watching - I know - included I think what's what into what
[00:41:36] but wait in between these gives you a
[00:41:40] but wait in between these gives you a map it to another can't eat this okay so
[00:41:45] map it to another can't eat this okay so you like you try to earn adversarially
[00:41:48] you like you try to earn adversarially yeah I I don't know if that's has been
[00:41:51] yeah I I don't know if that's has been done I don't think that has been done so
[00:41:52] done I don't think that has been done so you're talking about taking a little
[00:41:54] you're talking about taking a little coder that takes the adverse an example
[00:41:55] coder that takes the adverse an example to convert you to a normal image of the
[00:41:57] to convert you to a normal image of the cat and then give the cat maybe yeah I
[00:42:00] cat and then give the cat maybe yeah I don't know so it's a topic of research
[00:42:03] don't know so it's a topic of research okay let's move on because we don't have
[00:42:05] okay let's move on because we don't have too much time so just to conclude what
[00:42:08] too much time so just to conclude what we're going to count as a general way to
[00:42:11] we're going to count as a general way to generate adversely examples is this
[00:42:13] generate adversely examples is this formula this is going to be a fast way
[00:42:26] formula this is going to be a fast way to generate adversary example so this
[00:42:29] to generate adversary example so this method is called the phase fast gradient
[00:42:34] sign method so basically what we're
[00:42:40] sign method so basically what we're doing is that we can we're linearizing
[00:42:42] doing is that we can we're linearizing the cost function in in the proximity of
[00:42:46] the cost function in in the proximity of the parameters and we're saying that
[00:42:49] the parameters and we're saying that what applied to linear networks here is
[00:42:52] what applied to linear networks here is going to also apply for this general
[00:42:54] going to also apply for this general formula for deeper networks so we're
[00:42:57] formula for deeper networks so we're pushing the pixel images in one
[00:42:59] pushing the pixel images in one direction that is going to impact highly
[00:43:02] direction that is going to impact highly the output okay so that's the intuition
[00:43:05] the output okay so that's the intuition behind it now you might say that okay we
[00:43:08] behind it now you might say that okay we did this example on a linear network but
[00:43:10] did this example on a linear network but neural networks are not linear they're
[00:43:11] neural networks are not linear they're highly nonlinear in fact if you look
[00:43:14] highly nonlinear in fact if you look where the research has been going for
[00:43:16] where the research has been going for the past few years we're trying to
[00:43:18] the past few years we're trying to linearize all the behaviors of these
[00:43:20] linearize all the behaviors of these neural networks with value for example
[00:43:22] neural networks with value for example or width of your initialization all that
[00:43:25] or width of your initialization all that type of methods even a sigmoid when we
[00:43:27] type of methods even a sigmoid when we train on sigmoid we do all we can to put
[00:43:29] train on sigmoid we do all we can to put sigmoid in the linear regime because we
[00:43:32] sigmoid in the linear regime because we want fast training okay and one last
[00:43:36] want fast training okay and one last thing that I'll mention for adversary'
[00:43:38] thing that I'll mention for adversary' examples
[00:43:40] examples is if I have a network like this so
[00:43:53] is if I have a network like this so fully connected with three-dimensional
[00:43:56] fully connected with three-dimensional inputs up yeah and then one here and
[00:44:03] inputs up yeah and then one here and then the output what's interesting is
[00:44:06] then the output what's interesting is computing the chain rule on this neuron
[00:44:09] computing the chain rule on this neuron will give you that derivative of the
[00:44:12] will give you that derivative of the loss function with respect to let's say
[00:44:16] loss function with respect to let's say X is equal to the derivative of the loss
[00:44:20] X is equal to the derivative of the loss function with respect to Z 1 1 here
[00:44:27] function with respect to Z 1 1 here times derivative of Z 1 1 with respect
[00:44:33] times derivative of Z 1 1 with respect to X let's see where we're going we're
[00:44:36] to X let's see where we're going we're going there's actually a summation here
[00:44:38] going there's actually a summation here but anyway just let me illustrate the
[00:44:41] but anyway just let me illustrate the point what we're what we're saying is
[00:44:44] point what we're what we're saying is that what we're what we try to do with
[00:44:46] that what we're what we try to do with neural networks is to have these
[00:44:48] neural networks is to have these gradients be high because if this
[00:44:53] gradients be high because if this gradient is not high we're not able to
[00:44:55] gradient is not high we're not able to train the parameters of this neuron and
[00:44:57] train the parameters of this neuron and we need this gradient to be high because
[00:44:59] we need this gradient to be high because if you want to do the same thing with
[00:45:01] if you want to do the same thing with the we W 1 1 which is the parameters
[00:45:05] the we W 1 1 which is the parameters related to this neuron you would need to
[00:45:08] related to this neuron you would need to go to this Traynham correct so we need
[00:45:11] go to this Traynham correct so we need this gradient to be high and if this
[00:45:13] this gradient to be high and if this gradient is high the gradient with
[00:45:15] gradient is high the gradient with respect to the input is also going to be
[00:45:16] respect to the input is also going to be high because you use the same gradient
[00:45:19] high because you use the same gradient in the chain rule so networks that are
[00:45:22] in the chain rule so networks that are that have high gradients and that are
[00:45:25] that have high gradients and that are operating in the linear regime or even
[00:45:27] operating in the linear regime or even more vulnerable to adverse real examples
[00:45:30] more vulnerable to adverse real examples because of this observation so any
[00:45:35] because of this observation so any question on adversarial examples before
[00:45:39] question on adversarial examples before we move on I think we don't have time
[00:45:40] we move on I think we don't have time and I would like to to go over the gans
[00:45:43] and I would like to to go over the gans with you guys so let's move on to guns
[00:45:46] with you guys so let's move on to guns I'll stick around to answer questions on
[00:45:48] I'll stick around to answer questions on that part so the general question we're
[00:45:51] that part so the general question we're asking now is do neural networks
[00:45:53] asking now is do neural networks understand
[00:45:54] understand the data because we've seen that some
[00:45:57] the data because we've seen that some some data points look like there would
[00:46:00] some data points look like there would be real but the neural networks don't
[00:46:03] be real but the neural networks don't understand it so more generally can we
[00:46:06] understand it so more generally can we build generate these networks that can
[00:46:07] build generate these networks that can mimic the real world distribution of
[00:46:10] mimic the real world distribution of images let's say and this is what we
[00:46:13] images let's say and this is what we will call generative address all at work
[00:46:15] will call generative address all at work we'll start by motivating it and then we
[00:46:17] we'll start by motivating it and then we look at something called the minimax
[00:46:19] look at something called the minimax game between two networks a generator
[00:46:20] game between two networks a generator and a discriminator that are going to
[00:46:22] and a discriminator that are going to help each other improve and finally we
[00:46:25] help each other improve and finally we will see that gans are hard to train
[00:46:29] will see that gans are hard to train we'll see some tips to train them and
[00:46:31] we'll see some tips to train them and finally go over some nice results and
[00:46:34] finally go over some nice results and methods to evaluate ganz ok so the
[00:46:41] methods to evaluate ganz ok so the motivation behind generative iverson
[00:46:43] motivation behind generative iverson networks is to endow computers with an
[00:46:45] networks is to endow computers with an understanding of our world ok so by by
[00:46:50] understanding of our world ok so by by that we mean that we want to collect a
[00:46:52] that we mean that we want to collect a lot of data use it to train a model that
[00:46:54] lot of data use it to train a model that can generate images that look like
[00:46:56] can generate images that look like they're real even if they're not so a
[00:46:58] they're real even if they're not so a dog that has never existed can be
[00:47:00] dog that has never existed can be generated by this network and finally
[00:47:04] generated by this network and finally the number of parameters of the model is
[00:47:07] the number of parameters of the model is smaller than the amount of data we
[00:47:09] smaller than the amount of data we already talked about that and this is
[00:47:11] already talked about that and this is the intuition behind why a generative
[00:47:13] the intuition behind why a generative network can exist is because there is
[00:47:16] network can exist is because there is too much data in the world any images
[00:47:18] too much data in the world any images count as the data for generative network
[00:47:20] count as the data for generative network and there are not enough parameters to
[00:47:22] and there are not enough parameters to mimic this data you know you have the
[00:47:25] mimic this data you know you have the network needs to understand the salient
[00:47:28] network needs to understand the salient features of the data set because it
[00:47:30] features of the data set because it doesn't have enough parameter to overfit
[00:47:32] doesn't have enough parameter to overfit everything so let's talk about
[00:47:35] everything so let's talk about probability distributions so these are
[00:47:37] probability distributions so these are samples from real images that have been
[00:47:39] samples from real images that have been taken and if you plot this real data
[00:47:42] taken and if you plot this real data distribution in a 2d map it would look
[00:47:46] distribution in a 2d map it would look like something like that I made it up
[00:47:48] like something like that I made it up but this is the image space similar to
[00:47:51] but this is the image space similar to what we talked about in adverts or
[00:47:52] what we talked about in adverts or networks and this green shape is the
[00:47:55] networks and this green shape is the space of real-world images now if you
[00:48:00] space of real-world images now if you train a generator and generate some
[00:48:02] train a generator and generate some images that look like this and these
[00:48:04] images that look like this and these images come from Stagg an from John
[00:48:09] images come from Stagg an from John this distribution if the generator is
[00:48:12] this distribution if the generator is not good is not going to match the real
[00:48:14] not good is not going to match the real world distribution so our goal here is
[00:48:16] world distribution so our goal here is to do something so that the red
[00:48:19] to do something so that the red distribution matches the real-world
[00:48:21] distribution matches the real-world distribution going to train the network
[00:48:23] distribution going to train the network so that it realizes what we want so this
[00:48:28] so that it realizes what we want so this is our generator and it's what counts it
[00:48:30] is our generator and it's what counts it what what we want to train ultimately we
[00:48:33] what what we want to train ultimately we want to give it let's say a random
[00:48:35] want to give it let's say a random number or a random latent code of 100
[00:48:38] number or a random latent code of 100 dimension scalar numbers and we want it
[00:48:41] dimension scalar numbers and we want it to output an image but of course because
[00:48:44] to output an image but of course because it's not trained initially it's going to
[00:48:46] it's not trained initially it's going to output a random image looks like
[00:48:48] output a random image looks like something like that
[00:48:49] something like that random pixels now this image doesn't
[00:48:54] random pixels now this image doesn't look very good
[00:48:55] look very good what we want is these images to look
[00:48:58] what we want is these images to look like generated images that are very
[00:49:00] like generated images that are very similar to the real world so how are we
[00:49:02] similar to the real world so how are we going to help this generator train it's
[00:49:05] going to help this generator train it's not like what we did in classic
[00:49:07] not like what we did in classic supervised learning because we don't
[00:49:09] supervised learning because we don't have we don't really have inputs and
[00:49:12] have we don't really have inputs and labels you know there is no label we
[00:49:14] labels you know there is no label we could maybe give it an image of a cat
[00:49:16] could maybe give it an image of a cat and ask it to output another cat
[00:49:21] and ask it to output another cat but we want the network to be able to
[00:49:23] but we want the network to be able to output things that don't exist things
[00:49:25] output things that don't exist things that we've never seen right so we want
[00:49:27] that we've never seen right so we want the network to understand what a cat is
[00:49:29] the network to understand what a cat is but not overfit to the cat we give it so
[00:49:33] but not overfit to the cat we give it so the way we're going to do it is through
[00:49:35] the way we're going to do it is through a small game between this network called
[00:49:38] a small game between this network called the generator G and another network
[00:49:40] the generator G and another network called the discriminator D let's let's
[00:49:44] called the discriminator D let's let's look at how it works we have a database
[00:49:47] look at how it works we have a database of real images and we're going to start
[00:49:53] of real images and we're going to start with this distribution on the bottom
[00:49:54] with this distribution on the bottom which is the real world data
[00:49:56] which is the real world data distribution is the distribution of the
[00:49:57] distribution is the distribution of the images in this database
[00:49:59] images in this database now our generator has this distribution
[00:50:02] now our generator has this distribution initially it means the pixels that you
[00:50:04] initially it means the pixels that you see here probably follow a distribution
[00:50:06] see here probably follow a distribution that doesn't match the real world will
[00:50:09] that doesn't match the real world will define a discriminator D and the goal of
[00:50:11] define a discriminator D and the goal of the discriminator will be to detect if
[00:50:16] the discriminator will be to detect if an image is real or not so we're going
[00:50:19] an image is real or not so we're going to give several images to discuss to
[00:50:20] to give several images to discuss to measure some
[00:50:21] measure some times we will give it generated images
[00:50:23] times we will give it generated images and sometimes we will give it real-world
[00:50:26] and sometimes we will give it real-world images what we want is that this
[00:50:28] images what we want is that this discriminator is a binary classifier
[00:50:30] discriminator is a binary classifier that outputs one if the image is real
[00:50:36] that outputs one if the image is real and zero if the image was generated okay
[00:50:40] and zero if the image was generated okay so let's say we give it X coming from
[00:50:43] so let's say we give it X coming from the generated image it's going to give
[00:50:46] the generated image it's going to give us zero because we want the
[00:50:48] us zero because we want the discriminator to detect that X was
[00:50:51] discriminator to detect that X was actually G of Z if the image came from
[00:50:56] actually G of Z if the image came from our database of real images we want the
[00:50:58] our database of real images we want the discriminator to say one so it seems
[00:51:02] discriminator to say one so it seems like the discriminator would be easy to
[00:51:03] like the discriminator would be easy to train right it's just a binary
[00:51:04] train right it's just a binary classification we can define a loss
[00:51:06] classification we can define a loss function that is the binary
[00:51:07] function that is the binary cross-entropy and the good thing is we
[00:51:10] cross-entropy and the good thing is we can have as many label as we want like
[00:51:13] can have as many label as we want like it's it's unsupervised but a little bit
[00:51:15] it's it's unsupervised but a little bit supervised you know we have this
[00:51:16] supervised you know we have this database and we label it all as one it's
[00:51:20] database and we label it all as one it's just this image exists let's label them
[00:51:22] just this image exists let's label them as one for this creator and everything
[00:51:24] as one for this creator and everything that comes out of the generator let's
[00:51:25] that comes out of the generator let's label it as zero for the discriminator
[00:51:27] label it as zero for the discriminator so basically data is not costly at all
[00:51:29] so basically data is not costly at all in this point the way we will train is
[00:51:34] in this point the way we will train is that we will back propagate the gradient
[00:51:35] that we will back propagate the gradient to the discriminator to train the
[00:51:37] to the discriminator to train the discriminator using a binary croissant
[00:51:39] discriminator using a binary croissant roughly but what we ultimately want is
[00:51:41] roughly but what we ultimately want is to train the generator that's what we
[00:51:44] to train the generator that's what we want at the end we're not going to use
[00:51:45] want at the end we're not going to use the discriminator we just want to
[00:51:47] the discriminator we just want to generate images so we're going to direct
[00:51:49] generate images so we're going to direct the gradient to go back to the generator
[00:51:50] the gradient to go back to the generator and why does this gradient go back to
[00:51:54] and why does this gradient go back to the generator the reason is that X is G
[00:51:59] the generator the reason is that X is G of Z it means we can back propagate the
[00:52:02] of Z it means we can back propagate the gradient all the way back to the input
[00:52:04] gradient all the way back to the input of the discriminator but this input
[00:52:06] of the discriminator but this input depends on the input of the generator if
[00:52:09] depends on the input of the generator if the image was generated so we can also
[00:52:11] the image was generated so we can also back propagate and direct the gradient
[00:52:12] back propagate and direct the gradient to the generator does it make sense
[00:52:15] to the generator does it make sense there is a direct relation between Z and
[00:52:17] there is a direct relation between Z and the last function in the case where the
[00:52:20] the last function in the case where the image was generated if the image was
[00:52:22] image was generated if the image was real then the generator couldn't get
[00:52:25] real then the generator couldn't get degraded because X doesn't depend on Z
[00:52:27] degraded because X doesn't depend on Z or on the features and parameters of the
[00:52:30] or on the features and parameters of the generator ok so we would run an
[00:52:34] generator ok so we would run an algorithm such as ad
[00:52:37] algorithm such as ad simultaneously on two many matches one
[00:52:39] simultaneously on two many matches one for the true data and from forms
[00:52:41] for the true data and from forms generated data does this scheme make
[00:52:45] generated data does this scheme make sense to everyone
[00:52:46] sense to everyone yeah one question so there's many method
[00:52:55] yeah one question so there's many method of doing your question is about mixing
[00:52:57] of doing your question is about mixing them in batches usually we would use we
[00:52:59] them in batches usually we would use we would use one mini batch for the real
[00:53:01] would use one mini batch for the real data and one mini batch for the fake
[00:53:03] data and one mini batch for the fake data but if in practice you can try
[00:53:06] data but if in practice you can try other things so there are many methods
[00:53:09] other things so there are many methods that are being tried to train Gans
[00:53:12] that are being tried to train Gans properly we're going to delve a little
[00:53:14] properly we're going to delve a little more into the details of that when we
[00:53:15] more into the details of that when we will see the loss functions so we hope
[00:53:19] will see the loss functions so we hope that the probability distributions will
[00:53:21] that the probability distributions will match at the end and if it matches we're
[00:53:23] match at the end and if it matches we're going to just take the generator and
[00:53:25] going to just take the generator and generate images normally it should be
[00:53:27] generate images normally it should be able to generate images that look real
[00:53:29] able to generate images that look real that look like they came from this
[00:53:31] that look like they came from this distribution okay sounds good so now
[00:53:36] distribution okay sounds good so now let's talk more about the training
[00:53:38] let's talk more about the training procedure and try to figure out what the
[00:53:39] procedure and try to figure out what the loss function should be in this case
[00:53:44] what should be the cost of the
[00:53:46] what should be the cost of the discriminator
[00:53:51] assuming assuming we give too many
[00:53:53] assuming assuming we give too many batches one for real data so real images
[00:53:57] batches one for real data so real images and one for generated data that come
[00:53:59] and one for generated data that come from G yes the same basic the same basic
[00:54:11] from G yes the same basic the same basic loss function we use from binary class
[00:54:13] loss function we use from binary class for binary classifiers
[00:54:14] for binary classifiers it's true we're going to tweak it a tiny
[00:54:16] it's true we're going to tweak it a tiny bit but it's the same idea so this is
[00:54:18] bit but it's the same idea so this is what it can look like we're going to
[00:54:20] what it can look like we're going to call it JD cost function of the
[00:54:22] call it JD cost function of the discriminator it has two terms what does
[00:54:25] discriminator it has two terms what does the first term say what does the second
[00:54:27] the first term say what does the second term say and you can recognize the
[00:54:31] term say and you can recognize the binary croissant trophy here the only
[00:54:34] binary croissant trophy here the only difference is that we have a able that
[00:54:37] difference is that we have a able that is why real and a label that is why
[00:54:39] is why real and a label that is why generated in practice why real and why
[00:54:42] generated in practice why real and why generated are always going to be set to
[00:54:44] generated are always going to be set to values we know that Y generated is zero
[00:54:46] values we know that Y generated is zero and we know that Y real is one so we can
[00:54:49] and we know that Y real is one so we can just remove these two terms because
[00:54:50] just remove these two terms because they're both equal to 1 the first term
[00:54:53] they're both equal to 1 the first term is telling us these should correctly
[00:54:56] is telling us these should correctly label real data as one - croissant
[00:54:58] label real data as one - croissant repeater the first term of a binary
[00:55:01] repeater the first term of a binary cross-entropy the second term is going
[00:55:04] cross-entropy the second term is going to tell us this should correctly label
[00:55:06] to tell us this should correctly label generated data 0 so the difference with
[00:55:09] generated data 0 so the difference with classic croissant roba we've seen is
[00:55:10] classic croissant roba we've seen is that this summation is the summation
[00:55:12] that this summation is the summation over the real mini batch and the
[00:55:15] over the real mini batch and the summation on the 2nd cross entropy is
[00:55:17] summation on the 2nd cross entropy is the summation and generated mini batch
[00:55:18] the summation and generated mini batch that make sense so we both want D to
[00:55:24] that make sense so we both want D to correctly identify the real data and
[00:55:28] correctly identify the real data and also correctly identified fake data
[00:55:30] also correctly identified fake data that's why we have two terms now what
[00:55:34] that's why we have two terms now what about the generator what do you think
[00:55:36] about the generator what do you think should be the cost function of the
[00:55:37] should be the cost function of the generator yes if I can put it either
[00:55:43] generator yes if I can put it either that's from the generator I want to run
[00:55:46] that's from the generator I want to run the first half
[00:55:47] the first half because I don't have any Wi-Fi and
[00:55:50] because I don't have any Wi-Fi and inputs coming into generator yeah
[00:55:53] inputs coming into generator yeah exactly
[00:55:53] exactly yes but in your batch you will have had
[00:55:56] yes but in your batch you will have had like a certain number of real example of
[00:55:57] like a certain number of real example of certain dimmer of generating examples
[00:55:59] certain dimmer of generating examples the generated examples have no impact on
[00:56:01] the generated examples have no impact on the first cross entre P and same for the
[00:56:03] the first cross entre P and same for the real examples on the second course on
[00:56:05] real examples on the second course on true any other questions
[00:56:14] okay so coming back to the cross to the
[00:56:16] okay so coming back to the cross to the to the cost of the generator what should
[00:56:20] to the cost of the generator what should it be this is a tiny bit complicated
[00:56:25] it be this is a tiny bit complicated let's move let's move on because we
[00:56:27] let's move let's move on because we don't have too much time the cost of the
[00:56:29] don't have too much time the cost of the generator basically should say that G
[00:56:32] generator basically should say that G should try to swing it the goal is to
[00:56:35] should try to swing it the goal is to forge it to generate real samples and in
[00:56:38] forge it to generate real samples and in order to generate real samples we want
[00:56:40] order to generate real samples we want to fool D if J managed to fool D and D
[00:56:44] to fool D if J managed to fool D and D is very good it means G is very good
[00:56:46] is very good it means G is very good right the problem is that it's a game
[00:56:50] right the problem is that it's a game because if D is bad and G fools D it
[00:56:54] because if D is bad and G fools D it doesn't mean that G is good because G
[00:56:57] doesn't mean that G is good because G because D is bad it doesn't detect very
[00:56:59] because D is bad it doesn't detect very well the real versus fake examples we
[00:57:01] well the real versus fake examples we want D to go up to be very good and G to
[00:57:04] want D to go up to be very good and G to go up at the same time until the
[00:57:06] go up at the same time until the equilibrium is reached at a certain
[00:57:08] equilibrium is reached at a certain point where D will always output one
[00:57:11] point where D will always output one half like random probabilities because
[00:57:12] half like random probabilities because it cannot distinguish the samples coming
[00:57:15] it cannot distinguish the samples coming from G versus the real samples so this
[00:57:18] from G versus the real samples so this cost function is basically saying for
[00:57:22] cost function is basically saying for generated images we want to classify
[00:57:25] generated images we want to classify them as one okay so you know like if
[00:57:59] them as one okay so you know like if you're using so how to implement that if
[00:58:01] you're using so how to implement that if you're using a different framework
[00:58:02] you're using a different framework you've been building a graph right and
[00:58:05] you've been building a graph right and at the end of your graph you've been
[00:58:07] at the end of your graph you've been building your cost functions D that is
[00:58:10] building your cost functions D that is very close to a binary cross-entropy
[00:58:12] very close to a binary cross-entropy what you're going to just do is to
[00:58:15] what you're going to just do is to define a node that is going to be minus
[00:58:16] define a node that is going to be minus the cost function of D it's going every
[00:58:20] the cost function of D it's going every time you're going to call the function J
[00:58:23] time you're going to call the function J of G is going to run the graph
[00:58:27] of G is going to run the graph that you define for JFD and run a an
[00:58:30] that you define for JFD and run a an opposition operation an opposite of
[00:58:32] opposition operation an opposite of operation same way propagate gradients
[00:58:43] operation same way propagate gradients back the same way we're not going to
[00:58:45] back the same way we're not going to propagate the same way we're going to
[00:58:47] propagate the same way we're going to turn into a minus sign for the grade for
[00:58:50] turn into a minus sign for the grade for the generator so you know you you back
[00:58:53] the generator so you know you you back propagate on the on the on D and when
[00:58:56] propagate on the on the on D and when you back propagate on G you would flip
[00:58:57] you back propagate on G you would flip you would flip the sign that's all we do
[00:59:01] you would flip the sign that's all we do the same thing with the sign fleet terms
[00:59:03] the same thing with the sign fleet terms of implementation is just another
[00:59:05] of implementation is just another operation okay now let's look at
[00:59:08] operation okay now let's look at something interesting is that this Lord
[00:59:12] something interesting is that this Lord logarithm let's look at the graph of the
[00:59:17] logarithm let's look at the graph of the logarithm so I'm going to plot against
[00:59:27] logarithm so I'm going to plot against the abscess axe G sorry D of G of Z so
[00:59:32] the abscess axe G sorry D of G of Z so what does this mean this axis is the
[00:59:35] what does this mean this axis is the output of D when given a generated
[00:59:38] output of D when given a generated example G of Z is going to be between 0
[00:59:42] example G of Z is going to be between 0 and 1 because it's a probability D is a
[00:59:46] and 1 because it's a probability D is a binary classifier with a sigmoid our
[00:59:48] binary classifier with a sigmoid our output probably if we plot logarithm of
[00:59:52] output probably if we plot logarithm of X so like this type of thing this would
[00:59:58] X so like this type of thing this would be log of the of G of Z does it make
[01:00:04] be log of the of G of Z does it make sense is the logarithm function if I
[01:00:08] sense is the logarithm function if I plot minus that minus that so let me let
[01:00:15] plot minus that minus that so let me let me plot minus logarithm of G of G Ozzy
[01:00:18] me plot minus logarithm of G of G Ozzy or or let me let me do something else
[01:00:20] or or let me let me do something else let me plot logarithm of minus D of G of
[01:00:28] let me plot logarithm of minus D of G of Z
[01:00:32] this is it do you guys agree now what
[01:00:36] this is it do you guys agree now what I'm going to do is that I'm going to
[01:00:37] I'm going to do is that I'm going to plot another function that is this one
[01:00:42] plot another function that is this one that is logarithm of one minus D of G of
[01:00:49] that is logarithm of one minus D of G of Z okay so the question is right now what
[01:00:59] Z okay so the question is right now what we're doing is that we're saying the
[01:01:01] we're doing is that we're saying the cost function of the generator is
[01:01:04] cost function of the generator is logarithm of one minus D of G of Z so it
[01:01:08] logarithm of one minus D of G of Z so it looks like this right it looks like this
[01:01:12] looks like this right it looks like this one what's the issue with this one what
[01:01:17] one what's the issue with this one what do you think is the issue with this cost
[01:01:19] do you think is the issue with this cost function looking at it like that sorry
[01:01:29] function looking at it like that sorry can you say louder it goes to negative
[01:01:34] can you say louder it goes to negative infinity in one that's what you mean
[01:01:38] infinity in one that's what you mean yeah yeah and so the consequence of that
[01:01:40] yeah yeah and so the consequence of that is that the gradient here is going to be
[01:01:44] is that the gradient here is going to be very large the closer we go to one but
[01:01:48] very large the closer we go to one but the closer we are to zero the lower is
[01:01:50] the closer we are to zero the lower is the gradient and is the reverse
[01:01:52] the gradient and is the reverse phenomenon for this logarithm the
[01:01:55] phenomenon for this logarithm the gradient is very high and very high I
[01:01:57] gradient is very high and very high I mean in absolute value a very high when
[01:02:00] mean in absolute value a very high when we're close to zero but it's very low
[01:02:02] we're close to zero but it's very low when we go close to one okay so which
[01:02:07] when we go close to one okay so which loss function you think would be better
[01:02:08] loss function you think would be better a loss function that looks like this one
[01:02:11] a loss function that looks like this one or a loss function that looks like this
[01:02:12] or a loss function that looks like this one
[01:02:16] to train our generator
[01:02:23] the broader question is where are we
[01:02:26] the broader question is where are we early in the training are we close to
[01:02:28] early in the training are we close to here or always close to death what does
[01:02:32] here or always close to death what does it mean to be close they're two to one
[01:02:36] you're fooling the network it means that
[01:02:39] you're fooling the network it means that the kinks that generated samples or real
[01:02:43] the kinks that generated samples or real you're here this place is the contrary
[01:02:47] you're here this place is the contrary he thinks that generated samples are
[01:02:50] he thinks that generated samples are fake
[01:02:51] fake it means correctly finds out that
[01:02:54] it means correctly finds out that they're fake early on we're generally
[01:02:56] they're fake early on we're generally here because the discriminator is better
[01:02:59] here because the discriminator is better than the generator generator output
[01:03:01] than the generator generator output garbage at the beginning and it's very
[01:03:03] garbage at the beginning and it's very easy for the discriminator to figure out
[01:03:05] easy for the discriminator to figure out that it's fake because this garbage
[01:03:06] that it's fake because this garbage looks very different from real-world
[01:03:07] looks very different from real-world data so early on we're here
[01:03:10] data so early on we're here so which function is the best one to to
[01:03:12] so which function is the best one to to to to be our cost yeah so probably this
[01:03:18] to to be our cost yeah so probably this one is better so we have to use a
[01:03:20] one is better so we have to use a mathematical trick to change this into
[01:03:23] mathematical trick to change this into that right and the mathematical trick is
[01:03:26] that right and the mathematical trick is pretty standard right now we're
[01:03:28] pretty standard right now we're minimizing something that is in log of 1
[01:03:31] minimizing something that is in log of 1 minus X we can say that doing so is the
[01:03:37] minus X we can say that doing so is the same as maximizing something that is in
[01:03:41] same as maximizing something that is in log of X near a simple flip min max flip
[01:03:46] log of X near a simple flip min max flip and we can also say that it's the same
[01:03:48] and we can also say that it's the same as minimizing something in minus log of
[01:03:52] as minimizing something in minus log of X does it make sense so we're going to
[01:03:57] X does it make sense so we're going to use this mathematical trick to convert
[01:03:59] use this mathematical trick to convert our function that is a saturating cost
[01:04:02] our function that is a saturating cost we would say into an on-stage rating
[01:04:04] we would say into an on-stage rating class that is going to look more like
[01:04:05] class that is going to look more like this let's see what it looks like so to
[01:04:11] this let's see what it looks like so to sum up our cost function currently looks
[01:04:14] sum up our cost function currently looks like that
[01:04:14] like that it's a saturating cost because early on
[01:04:17] it's a saturating cost because early on the gradients are small we cannot train
[01:04:20] the gradients are small we cannot train G we're going to do a flip that I just
[01:04:24] G we're going to do a flip that I just talked about on the board and convert
[01:04:26] talked about on the board and convert this into another function that is a non
[01:04:29] this into another function that is a non saturating cost ok what you yeah
[01:04:34] saturating cost ok what you yeah so the reason it's the blue one is like
[01:04:36] so the reason it's the blue one is like that is because I added a minus sign
[01:04:38] that is because I added a minus sign here so I'm flipping this okay and it's
[01:04:43] here so I'm flipping this okay and it's the same thing it's just the sign of the
[01:04:45] the same thing it's just the sign of the gradient that is going to be different
[01:04:46] gradient that is going to be different like that the gradient is high at the
[01:04:50] like that the gradient is high at the beginning and low at the end that makes
[01:04:53] beginning and low at the end that makes sense so we're going to do the use this
[01:04:57] sense so we're going to do the use this flip and so we have a new training
[01:04:59] flip and so we have a new training processor now where J of D didn't change
[01:05:01] processor now where J of D didn't change but J of G changed we have a minus sign
[01:05:04] but J of G changed we have a minus sign here and instead of the log of 1 minus D
[01:05:07] here and instead of the log of 1 minus D of G of Z we have the log of G of G of Z
[01:05:10] of G of Z we have the log of G of G of Z does that make sense to everyone cool
[01:05:14] does that make sense to everyone cool and actually so this is a fun thing if
[01:05:17] and actually so this is a fun thing if you should check this paper which is
[01:05:18] you should check this paper which is really cool Oregon's created equal it's
[01:05:20] really cool Oregon's created equal it's a large study of many many different
[01:05:23] a large study of many many different guns
[01:05:24] guns it shows what people have tried and you
[01:05:26] it shows what people have tried and you can see that people have tried all types
[01:05:28] can see that people have tried all types of laws to make guns trainable so it
[01:05:31] of laws to make guns trainable so it looks it looks complicated here but
[01:05:33] looks it looks complicated here but actually the mm gun is the first one we
[01:05:36] actually the mm gun is the first one we saw together is the minimax lost
[01:05:38] saw together is the minimax lost function the second one is the non
[01:05:40] function the second one is the non saturating one that we just see so you
[01:05:42] saturating one that we just see so you see between the first two the only
[01:05:44] see between the first two the only difference is that on the generator we
[01:05:46] difference is that on the generator we get the log of 1 minus D of X hat
[01:05:49] get the log of 1 minus D of X hat becoming law of minus log of D of X I
[01:05:54] becoming law of minus log of D of X I okay now another trick to train guns is
[01:05:58] okay now another trick to train guns is to use the fact that a non saturating to
[01:06:02] to use the fact that a non saturating to use the fact that D is usually easier to
[01:06:05] use the fact that D is usually easier to train than G but as the improved G can
[01:06:11] train than G but as the improved G can improve if D doesn't improve G cannot
[01:06:13] improve if D doesn't improve G cannot improve so you can see the the
[01:06:17] improve so you can see the the performance of D has an upper bound to
[01:06:19] performance of D has an upper bound to what G can achieve because of that we
[01:06:23] what G can achieve because of that we will usually train D more time than we
[01:06:25] will usually train D more time than we will train G so we will basically train
[01:06:28] will train G so we will basically train for nomination K times D 1 time G K
[01:06:33] for nomination K times D 1 time G K times D 1 times e and so on so that the
[01:06:36] times D 1 times e and so on so that the discriminator becomes better then the
[01:06:37] discriminator becomes better then the generator can catch up better then can
[01:06:40] generator can catch up better then can catch up and so on that make sense
[01:06:42] catch up and so on that make sense there's also methods to use like
[01:06:44] there's also methods to use like different learning rates for D ng
[01:06:47] different learning rates for D ng to take this into account to train
[01:06:48] to take this into account to train faster the discriminator okay because we
[01:06:53] faster the discriminator okay because we don't have too much time I'm going to
[01:06:54] don't have too much time I'm going to skip the bathroom Wiggins we're going to
[01:06:56] skip the bathroom Wiggins we're going to sit probably next week together after
[01:06:58] sit probably next week together after you guys have seen the bathroom videos
[01:07:02] you guys have seen the bathroom videos okay great school so just to sum up some
[01:07:06] okay great school so just to sum up some some tips to Train ganz is to modify the
[01:07:09] some tips to Train ganz is to modify the cost function we've seen one
[01:07:10] cost function we've seen one modification there are many more keeping
[01:07:13] modification there are many more keeping D up to date with respect to G so
[01:07:15] D up to date with respect to G so updating D more than you update g using
[01:07:18] updating D more than you update g using virtual match norm which is a derivate
[01:07:19] virtual match norm which is a derivate of batch norm so it's a different type
[01:07:22] of batch norm so it's a different type of action or is used here and something
[01:07:24] of action or is used here and something called one-sided lai label smoothing
[01:07:27] called one-sided lai label smoothing that i'm not going to talk about it
[01:07:28] that i'm not going to talk about it today because we don't have time so
[01:07:30] today because we don't have time so let's see some nice result now and
[01:07:33] let's see some nice result now and that's the funnest part so some of you
[01:07:37] that's the funnest part so some of you have worked with word embeddings and you
[01:07:39] have worked with word embeddings and you you might know that we're done weddings
[01:07:41] you might know that we're done weddings are vectors that can encode the meaning
[01:07:43] are vectors that can encode the meaning of the word and you can compute
[01:07:45] of the word and you can compute operation sometimes on this on these
[01:07:48] operation sometimes on this on these words so if you take if you take king -
[01:07:51] words so if you take if you take king - quinn it should be equal to mine - woman
[01:07:54] quinn it should be equal to mine - woman operations like that that happen in the
[01:07:57] operations like that that happen in the space of encoding so here's the same you
[01:08:00] space of encoding so here's the same you can use a generator to generate faces
[01:08:02] can use a generator to generate faces and the paper is listed on the bottom
[01:08:04] and the paper is listed on the bottom here so you give a code that is a random
[01:08:07] here so you give a code that is a random code and it will give you an image of a
[01:08:09] code and it will give you an image of a face you can give it a second code it's
[01:08:12] face you can give it a second code it's going to give you a second image that is
[01:08:14] going to give you a second image that is different from the first one because the
[01:08:15] different from the first one because the code was different you can give it a
[01:08:17] code was different you can give it a third one it's going to give you a third
[01:08:19] third one it's going to give you a third interface the fun part is if you take
[01:08:23] interface the fun part is if you take code 1 - code 2 plus code 3 so basically
[01:08:27] code 1 - code 2 plus code 3 so basically image of a man with glasses - image of a
[01:08:30] image of a man with glasses - image of a man plus image of the women will give
[01:08:33] man plus image of the women will give you an image of a woman with glasses
[01:08:37] so this is interesting because it means
[01:08:39] so this is interesting because it means that linear operation in the latent
[01:08:42] that linear operation in the latent space of codes have impact directly on
[01:08:45] space of codes have impact directly on the image space okay let's look at
[01:08:49] the image space okay let's look at something even better so you can use
[01:08:53] something even better so you can use guns for image generation of course
[01:08:54] guns for image generation of course these are very nice samples you see that
[01:08:57] these are very nice samples you see that sometimes guns have problem with with oh
[01:09:02] sometimes guns have problem with with oh no I don't think that's the dog but but
[01:09:05] no I don't think that's the dog but but but these are stag and plus vs. is a
[01:09:08] but these are stag and plus vs. is a very impressive gun that has generated
[01:09:10] very impressive gun that has generated that has been state of the art for a
[01:09:11] that has been state of the art for a long time okay so let's see something
[01:09:16] long time okay so let's see something fun something called image to image
[01:09:18] fun something called image to image translation so actually the the project
[01:09:22] translation so actually the the project winners last quarter in spring was a
[01:09:24] winners last quarter in spring was a project dealing with exactly that
[01:09:26] project dealing with exactly that generating satellite images based on the
[01:09:29] generating satellite images based on the map image so given a map image generate
[01:09:32] map image so given a map image generate the satellite image using a gun so you
[01:09:34] the satellite image using a gun so you see that instead of giving a latent code
[01:09:35] see that instead of giving a latent code that was 100 dimensional you could give
[01:09:37] that was 100 dimensional you could give a very detailed code the code can be
[01:09:39] a very detailed code the code can be this image right and you have to find a
[01:09:43] this image right and you have to find a way to constrain your network in a
[01:09:44] way to constrain your network in a certain width in a certain way to push
[01:09:47] certain width in a certain way to push it to output exactly the satellite image
[01:09:50] it to output exactly the satellite image that corresponds it to this map image
[01:09:52] that corresponds it to this map image there are many other results that are
[01:09:54] there are many other results that are found converting zebras to horses to
[01:09:56] found converting zebras to horses to zebras and zebras to horses and apples
[01:10:00] zebras and zebras to horses and apples to oranges and oranges to Apple so let's
[01:10:03] to oranges and oranges to Apple so let's do a case study together let's say our
[01:10:06] do a case study together let's say our goal is to convert horses to zebras on
[01:10:08] goal is to convert horses to zebras on images and vice versa can you tell me
[01:10:12] images and vice versa can you tell me what data we need let's go quickly so
[01:10:14] what data we need let's go quickly so that we have some time yeah horses and
[01:10:18] that we have some time yeah horses and zebras do you need her images you know
[01:10:20] zebras do you need her images you know like you need to have the same image of
[01:10:23] like you need to have the same image of your horse as a zebra yeah so the
[01:10:26] your horse as a zebra yeah so the problem is okay we could have labels
[01:10:29] problem is okay we could have labels images you know like a horse and it's
[01:10:33] images you know like a horse and it's zebra doppelganger in the same position
[01:10:35] zebra doppelganger in the same position and we could train a network to take one
[01:10:38] and we could train a network to take one and out with the other unfortunately we
[01:10:40] and out with the other unfortunately we don't not every horse has a doppelganger
[01:10:42] don't not every horse has a doppelganger that is a zebra so we cannot do that so
[01:10:45] that is a zebra so we cannot do that so instead we're going to do unpaired
[01:10:47] instead we're going to do unpaired unpaired generative address or networks
[01:10:50] unpaired generative address or networks it means we have a database of horses
[01:10:52] it means we have a database of horses and a database of zebras but these are
[01:10:55] and a database of zebras but these are different horses and different zebras
[01:10:56] different horses and different zebras they're not one-to-one there's no one to
[01:10:58] they're not one-to-one there's no one to one mapping between them there's no
[01:11:00] one mapping between them there's no mapping at all what architecture do you
[01:11:02] mapping at all what architecture do you want to use nice not really ok so let's
[01:11:16] want to use nice not really ok so let's see about the architecture and the cost
[01:11:17] see about the architecture and the cost so I'm gonna go over it quickly because
[01:11:20] so I'm gonna go over it quickly because it's a it's a very fun gun with its
[01:11:22] it's a it's a very fun gun with its called cycle gun so the way we're going
[01:11:25] called cycle gun so the way we're going to work it out is we have a horse called
[01:11:27] to work it out is we have a horse called capital H we want to generate the zebra
[01:11:31] capital H we want to generate the zebra version of this horse right so we give
[01:11:32] version of this horse right so we give it to a generator that we call g1
[01:11:34] it to a generator that we call g1 you can call it h to Z like horse to
[01:11:37] you can call it h to Z like horse to zebra it should give us this horse H as
[01:11:40] zebra it should give us this horse H as a zebra right and in fact if we're
[01:11:44] a zebra right and in fact if we're training again we need a discriminator
[01:11:45] training again we need a discriminator so we will add a discriminator that is
[01:11:47] so we will add a discriminator that is going to be a binary classifier to tell
[01:11:50] going to be a binary classifier to tell us if this image outputted by generator
[01:11:53] us if this image outputted by generator 1 is real or not
[01:11:54] 1 is real or not so this discriminator is going to take
[01:11:57] so this discriminator is going to take in some images of zebras probably or yes
[01:12:03] in some images of zebras probably or yes zebras or horses and it's going to also
[01:12:05] zebras or horses and it's going to also take the generated images I'm going to
[01:12:09] take the generated images I'm going to see which one is fake which one is real
[01:12:11] see which one is fake which one is real on the other hand we're going to do and
[01:12:14] on the other hand we're going to do and the vice-versa is very important we need
[01:12:16] the vice-versa is very important we need to enforce the fact that this horse G 1
[01:12:20] to enforce the fact that this horse G 1 of H should be the same horse as H in
[01:12:25] of H should be the same horse as H in order to do that we're going to create
[01:12:26] order to do that we're going to create another generator which is going to take
[01:12:30] another generator which is going to take the generated image and generate back
[01:12:32] the generated image and generate back the input image and this is where we
[01:12:35] the input image and this is where we will be able to enforce the constraints
[01:12:37] will be able to enforce the constraints that G 2 of G 1 of H should be equal to
[01:12:40] that G 2 of G 1 of H should be equal to H do you see why this loop is super
[01:12:43] H do you see why this loop is super important because if we don't have this
[01:12:46] important because if we don't have this loop we don't have the constraints on
[01:12:47] loop we don't have the constraints on the fact that the horse should be the
[01:12:50] the fact that the horse should be the the zebra should be the horse as a zebra
[01:12:53] the zebra should be the horse as a zebra the same horse as H so we'll do that and
[01:12:56] the same horse as H so we'll do that and we have a second discriminator to decide
[01:12:58] we have a second discriminator to decide if this image is real this is one step H
[01:13:02] if this image is real this is one step H to Z
[01:13:03] to Z another state might be z2h where we
[01:13:05] another state might be z2h where we start with the zebra give it to
[01:13:06] start with the zebra give it to generator to generate the horse version
[01:13:08] generator to generate the horse version of the zebra discriminate generate back
[01:13:12] of the zebra discriminate generate back the zebra version of the zebra and this
[01:13:17] the zebra version of the zebra and this commit
[01:13:17] commit does that make sense so this is the
[01:13:20] does that make sense so this is the general pattern used in cycle Gans and
[01:13:24] general pattern used in cycle Gans and what I'd like to go over is what loss
[01:13:28] what I'd like to go over is what loss should we minimize in order to unforce
[01:13:31] should we minimize in order to unforce the fact that we want the horse to be
[01:13:33] the fact that we want the horse to be converted to a zebra that is the same as
[01:13:35] converted to a zebra that is the same as the horse and someone gives me the terms
[01:13:39] the horse and someone gives me the terms that we need someone wants to give it a
[01:13:44] that we need someone wants to give it a try
[01:13:49] go for two minutes
[01:13:53] go for two minutes yes you want to make sure that the
[01:13:56] yes you want to make sure that the picture in the end that is a zebra that
[01:13:58] picture in the end that is a zebra that you sure talk with matches the secret
[01:14:00] you sure talk with matches the secret that you started with or the horse -
[01:14:01] that you started with or the horse - shirt off with matches of course that
[01:14:03] shirt off with matches of course that you had immediately okay same time you
[01:14:05] you had immediately okay same time you also need to have discriminator -
[01:14:06] also need to have discriminator - identifying that the image is a real
[01:14:09] identifying that the image is a real zebra or real horse yeah because you
[01:14:12] zebra or real horse yeah because you don't want it to just sort of input in
[01:14:13] don't want it to just sort of input in the sampled image and then output back
[01:14:15] the sampled image and then output back to you the same okay so okay that's
[01:14:27] to you the same okay so okay that's great so you're saying we need the
[01:14:28] great so you're saying we need the classic cost functions that we've seen
[01:14:30] classic cost functions that we've seen previously plus another one that is the
[01:14:33] previously plus another one that is the matching between H and G - of G 1 of H
[01:14:36] matching between H and G - of G 1 of H and Z and G 1 of g12c correct so we'll
[01:14:40] and Z and G 1 of g12c correct so we'll have all these terms one term to Train
[01:14:42] have all these terms one term to Train d1 which is the classic term we've seen
[01:14:45] d1 which is the classic term we've seen differentiate real images from generated
[01:14:48] differentiate real images from generated images g1 as well same we were using the
[01:14:53] images g1 as well same we were using the non saturating cost on generate images
[01:14:55] non saturating cost on generate images same for D - same for G - these are
[01:14:58] same for D - same for G - these are classics the one we need to add to all
[01:15:00] classics the one we need to add to all of this is the cycle cost which is the
[01:15:04] of this is the cycle cost which is the distance between this term g2 of G 1 of
[01:15:08] distance between this term g2 of G 1 of H and H and the same thing for zebras
[01:15:11] H and H and the same thing for zebras does that make sense so you have the
[01:15:14] does that make sense so you have the intuition to build that type of losses
[01:15:15] intuition to build that type of losses we just sum everything and gives us the
[01:15:17] we just sum everything and gives us the cost function we're looking for the same
[01:15:26] cost function we're looking for the same cost function for d1 and d2 yeah so the
[01:15:30] cost function for d1 and d2 yeah so the you could but it's not going to work
[01:15:32] you could but it's not going to work that well I think so I think there is a
[01:15:34] that well I think so I think there is a there's a tiny mistake here is that the
[01:15:37] there's a tiny mistake here is that the zi here the small zi should be small H I
[01:15:40] zi here the small zi should be small H I and the small H ion top should be a
[01:15:44] and the small H ion top should be a small C eye because the discriminator
[01:15:46] small C eye because the discriminator one is going to receive generated
[01:15:48] one is going to receive generated samples that look like zebras because it
[01:15:50] samples that look like zebras because it came out of g1 so you want the real
[01:15:53] came out of g1 so you want the real database that you give it to to be
[01:15:56] database that you give it to to be zebras as well to force to force the
[01:15:59] zebras as well to force to force the generator want to output things that
[01:16:00] generator want to output things that look like zebras and vice-versa for the
[01:16:03] look like zebras and vice-versa for the second one okay
[01:16:06] second one okay and this is my favorite so you can
[01:16:09] and this is my favorite so you can convert the ROM into a face and back to
[01:16:12] convert the ROM into a face and back to a ramen is the most fun application I
[01:16:16] a ramen is the most fun application I found is from Naruto me at all
[01:16:19] found is from Naruto me at all antic we attack oh so it's Japanese
[01:16:22] antic we attack oh so it's Japanese research lab were working hard to do
[01:16:26] research lab were working hard to do face to ramen yeah and actually in two
[01:16:29] face to ramen yeah and actually in two in two to three weeks you will learn
[01:16:31] in two to three weeks you will learn object detection you know to detect
[01:16:33] object detection you know to detect faces and if you learn that maybe you
[01:16:35] faces and if you learn that maybe you can start a project to like detect a
[01:16:37] can start a project to like detect a face and then replace it by a ramen
[01:16:39] face and then replace it by a ramen because also funny funny work by Naruto
[01:16:43] because also funny funny work by Naruto me okay oh this is a super cool
[01:16:48] me okay oh this is a super cool application as well so let's look at
[01:16:50] application as well so let's look at that okay so we have so this model is a
[01:16:59] that okay so we have so this model is a conditional gun that was conditioned on
[01:17:02] conditional gun that was conditioned on learning learning edges and generating
[01:17:06] learning learning edges and generating cuts based on the edges so I'm gonna I'm
[01:17:09] cuts based on the edges so I'm gonna I'm gonna try to draw a cat sorry I cannot
[01:17:13] gonna try to draw a cat sorry I cannot see again I'm not a good driver
[01:17:25] it's the cat okay he's going down with
[01:17:29] it's the cat okay he's going down with the model I hope it's gonna work
[01:17:43] okay I don't think it works but it's
[01:17:48] okay I don't think it works but it's supposed to work
[01:17:49] supposed to work so you can generate cats baits on on
[01:17:51] so you can generate cats baits on on edges and you can do it for different
[01:17:54] edges and you can do it for different things you can do it for a shoe so all
[01:17:56] things you can do it for a shoe so all these model have been trained for that
[01:17:59] these model have been trained for that okay
[01:18:02] okay yes go for it
[01:18:15] straighten your feet you have to train
[01:18:21] straighten your feet you have to train it specifically for the domain so like
[01:18:23] it specifically for the domain so like these models are different swing of the
[01:18:25] these models are different swing of the 13 train okay looking for my
[01:18:29] 13 train okay looking for my presentation I missed it the
[01:18:36] presentation I missed it the presentation disappeared okay
[01:18:39] presentation disappeared okay another application is super resolution
[01:18:41] another application is super resolution you can give a lower resolution image
[01:18:43] you can give a lower resolution image and generate the super resolution
[01:18:44] and generate the super resolution version of it using guns and this is
[01:18:47] version of it using guns and this is pretty cool because you can get a high
[01:18:50] pretty cool because you can get a high resolution image downsample it and use
[01:18:52] resolution image downsample it and use this as the minimax game you know like
[01:18:56] this as the minimax game you know like you have the high resolution version of
[01:18:59] you have the high resolution version of the lower very lower resolution image
[01:19:02] the lower very lower resolution image other applications can be
[01:19:04] other applications can be privacy-preserving so some people have
[01:19:06] privacy-preserving so some people have been working on you know in medical in
[01:19:10] been working on you know in medical in the medical space privacy is a huge
[01:19:12] the medical space privacy is a huge issue you cannot share data set among
[01:19:14] issue you cannot share data set among hospitals among medical teams is common
[01:19:16] hospitals among medical teams is common so people have been looking at
[01:19:18] so people have been looking at generating a data set that looks like a
[01:19:21] generating a data set that looks like a medical data set if you train a model on
[01:19:24] medical data set if you train a model on this data set is going to give you the
[01:19:26] this data set is going to give you the same type of parameters than the other
[01:19:27] same type of parameters than the other one but this data set is anonymized
[01:19:30] one but this data set is anonymized so they can share the anonymized data
[01:19:32] so they can share the anonymized data with each other
[01:19:33] with each other and train their models and that without
[01:19:36] and train their models and that without being able to access the information of
[01:19:38] being able to access the information of the
[01:19:38] the patient and who is manufacturing is
[01:19:43] patient and who is manufacturing is important as well so Gans can generate
[01:19:46] important as well so Gans can generate very specific objects that can replace
[01:19:52] very specific objects that can replace bones for humans personalized - - to the
[01:19:56] bones for humans personalized - - to the human body so same for dental if you
[01:19:58] human body so same for dental if you lose the teeth the the technician can
[01:20:01] lose the teeth the the technician can take a picture and decide what the the
[01:20:03] take a picture and decide what the the crown should look like the gun can
[01:20:06] crown should look like the gun can generate it another topic is how to
[01:20:10] generate it another topic is how to evaluate guns you know you might say we
[01:20:14] evaluate guns you know you might say we can just look at the images and see if
[01:20:16] can just look at the images and see if they look real and it will give us an
[01:20:18] they look real and it will give us an idea if the gun is working well in
[01:20:20] idea if the gun is working well in practice it's hard because maybe the
[01:20:21] practice it's hard because maybe the images you're looking at or overfitting
[01:20:23] images you're looking at or overfitting images from the real samples you gave to
[01:20:25] images from the real samples you gave to the to the to the discriminator so how
[01:20:29] the to the to the discriminator so how do you check that it's very complicated
[01:20:31] do you check that it's very complicated so human annotation is a big one where
[01:20:34] so human annotation is a big one where you would you would build a software
[01:20:37] you would you would build a software push it on the cloud and people around
[01:20:39] push it on the cloud and people around the world are going to select which
[01:20:41] the world are going to select which images look generated which images look
[01:20:43] images look generated which images look not generated to see if a human can can
[01:20:46] not generated to see if a human can can can compare your gun to real-world data
[01:20:48] can compare your gun to real-world data and how your gun performs so it would
[01:20:51] and how your gun performs so it would look like that a web app indicates which
[01:20:54] look like that a web app indicates which image is fake which image is real you
[01:20:56] image is fake which image is real you can you can do different experiments
[01:20:57] can you can do different experiments like you can show very quickly an image
[01:20:59] like you can show very quickly an image for a fraction of a second and ask them
[01:21:02] for a fraction of a second and ask them was it real or not or you can give them
[01:21:04] was it real or not or you can give them unlimited time different experiments can
[01:21:06] unlimited time different experiments can be led there's another one that is more
[01:21:09] be led there's another one that is more scalable because the human annotation is
[01:21:11] scalable because the human annotation is very painful you know every time you
[01:21:12] very painful you know every time you train again you want to do that to
[01:21:14] train again you want to do that to verify if the gun is working well takes
[01:21:16] verify if the gun is working well takes a lot of time so instead of using humans
[01:21:18] a lot of time so instead of using humans why don't we use a very good network
[01:21:20] why don't we use a very good network that is good at classification in fact
[01:21:22] that is good at classification in fact in fact the inception network is a
[01:21:24] in fact the inception network is a tremendous network that does
[01:21:26] tremendous network that does classification we're going to give our
[01:21:28] classification we're going to give our image samples to this inception network
[01:21:31] image samples to this inception network and see what the network thinks of this
[01:21:33] and see what the network thinks of this image does it think that it's a dog or
[01:21:35] image does it think that it's a dog or not does it look like a dog for the
[01:21:37] not does it look like a dog for the network or not and we can scale it and
[01:21:38] network or not and we can scale it and make it very quick and there is a
[01:21:40] make it very quick and there is a inception score that that we can talk
[01:21:42] inception score that that we can talk next week about when we will have time
[01:21:43] next week about when we will have time it measures the quality of the samples
[01:21:47] it measures the quality of the samples and also it measures the diversity of
[01:21:49] and also it measures the diversity of the sample I'll go over it next week
[01:21:53] the sample I'll go over it next week there's another distance that is very
[01:21:56] there's another distance that is very popular that has been growing ly popular
[01:21:59] popular that has been growing ly popular recently called the fresh inception
[01:22:00] recently called the fresh inception distance and I advise you to check some
[01:22:04] distance and I advise you to check some of this paper to more interested in it
[01:22:06] of this paper to more interested in it for for your projects so just to end for
[01:22:10] for for your projects so just to end for next Wednesday we'll have C 2 &amp; 3 and
[01:22:13] next Wednesday we'll have C 2 &amp; 3 and also the whole C 3 modules you'll have 3
[01:22:16] also the whole C 3 modules you'll have 3 quizzes be careful these two quiz C 3 M
[01:22:19] quizzes be careful these two quiz C 3 M 1 and C 3 M 2 or longer than the normal
[01:22:22] 1 and C 3 M 2 or longer than the normal quizzes there are like wild case studies
[01:22:24] quizzes there are like wild case studies so take your time and go over it and
[01:22:27] so take your time and go over it and you'll have 1 programming assignments
[01:22:29] you'll have 1 programming assignments make sure you understand the batch norm
[01:22:32] make sure you understand the batch norm videos so that we can go over the
[01:22:33] videos so that we can go over the virtual batch norm hopefully next week
[01:22:35] virtual batch norm hopefully next week together and hands-on section this
[01:22:38] together and hands-on section this Friday you will receive your project
[01:22:41] Friday you will receive your project proposal as soon as possible and meet
[01:22:43] proposal as soon as possible and meet with your project TAS to go over the
[01:22:45] with your project TAS to go over the proposal and to make decisions regarding
[01:22:47] proposal and to make decisions regarding the next steps for your projects I'll
[01:22:50] the next steps for your projects I'll stick around in case you have any
[01:22:51] stick around in case you have any questions ok thanks guys


================================================================================
LECTURE 005
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 5 - AI + Healthcare

Source: https://www.youtube.com/watch?v=IM9ANAbufYM

---

Transcript

[00:00:05] thanks for being here
[00:00:07] thanks for being here five of
[00:00:09] five of 2:30 today we have the chance to to host
[00:00:14] 2:30 today we have the chance to to host a guest speaker Pranav reporter who is a
[00:00:17] a guest speaker Pranav reporter who is a PhD student in computer science advised
[00:00:21] PhD student in computer science advised by Professor Android and professor Percy
[00:00:23] by Professor Android and professor Percy Liang so Pranab is is working on AI and
[00:00:30] Liang so Pranab is is working on AI and high impact projects specifically
[00:00:32] high impact projects specifically related to healthcare and natural
[00:00:35] related to healthcare and natural language processing and today he is
[00:00:37] language processing and today he is going to to present an overview of AI
[00:00:40] going to to present an overview of AI for healthcare and he's going to dig
[00:00:41] for healthcare and he's going to dig into some projects he has led through
[00:00:44] into some projects he has led through case studies so don't hesitate to
[00:00:47] case studies so don't hesitate to interact I think we have a lot to learn
[00:00:48] interact I think we have a lot to learn from Pranav and he's really an industry
[00:00:52] from Pranav and he's really an industry expert for AI for healthcare and I let
[00:00:56] expert for AI for healthcare and I let you the mic run off thanks for being
[00:00:58] you the mic run off thanks for being here thanks Karen thanks for inviting me
[00:01:00] here thanks Karen thanks for inviting me can you hear me at the back is the mic
[00:01:03] can you hear me at the back is the mic on alright fantastic well really glad to
[00:01:06] on alright fantastic well really glad to be here so I want to cover three things
[00:01:11] be here so I want to cover three things today
[00:01:11] today the first is give you a sort of broad
[00:01:13] the first is give you a sort of broad overview of what AI applications in
[00:01:16] overview of what AI applications in healthcare look like the second is bring
[00:01:20] healthcare look like the second is bring you three case studies from the lab that
[00:01:22] you three case studies from the lab that I'm in as demonstrations of AI and
[00:01:27] I'm in as demonstrations of AI and healthcare research and then finally
[00:01:30] healthcare research and then finally some ways that you can get involved if
[00:01:32] some ways that you can get involved if you're interested in applying AI to high
[00:01:35] you're interested in applying AI to high impact problems in healthcare or if
[00:01:37] impact problems in healthcare or if you're from a healthcare background as
[00:01:38] you're from a healthcare background as well let's start with the first so one
[00:01:45] well let's start with the first so one way we can decompose the kinds of things
[00:01:48] way we can decompose the kinds of things I can do in healthcare is by trying to
[00:01:51] I can do in healthcare is by trying to formulate levels of questions that we
[00:01:53] formulate levels of questions that we can ask from data at the lowest level
[00:01:57] can ask from data at the lowest level are what are descriptive questions here
[00:02:00] are what are descriptive questions here we're really trying to get at what
[00:02:02] we're really trying to get at what happened then there are diagnostic
[00:02:06] happened then there are diagnostic questions where we're asking why did it
[00:02:08] questions where we're asking why did it happen if a patient had chest pains
[00:02:10] happen if a patient had chest pains I took their x-ray what is that chest
[00:02:14] I took their x-ray what is that chest x-ray show if they have palpitations
[00:02:17] x-ray show if they have palpitations what is their ECG show then their
[00:02:20] what is their ECG show then their predictive problems
[00:02:22] predictive problems sure I care about asking about the
[00:02:25] sure I care about asking about the future what's going to happen in the
[00:02:26] future what's going to happen in the next six months and then at the highest
[00:02:29] next six months and then at the highest level our prescriptive problems here I'm
[00:02:32] level our prescriptive problems here I'm really trying to ask okay I know this is
[00:02:35] really trying to ask okay I know this is the patient this is the symptoms they're
[00:02:37] the patient this is the symptoms they're coming in with this is how their
[00:02:39] coming in with this is how their trajectory will look like in terms of in
[00:02:43] trajectory will look like in terms of in terms of things that may happen that
[00:02:46] terms of things that may happen that their risk off what should I do and this
[00:02:48] their risk off what should I do and this is the real action point and that's I
[00:02:51] is the real action point and that's I would say the the goldmine but to get
[00:02:55] would say the the goldmine but to get there requires a lot of data and a lot
[00:02:58] there requires a lot of data and a lot of steps and we'll talk a little bit
[00:02:59] of steps and we'll talk a little bit more about that so in CS 2:30 you're all
[00:03:06] more about that so in CS 2:30 you're all well aware of the paradigm shift of deep
[00:03:11] well aware of the paradigm shift of deep learning and if we look at machine
[00:03:14] learning and if we look at machine learning in healthcare literature we see
[00:03:18] learning in healthcare literature we see that has a very similar pattern is that
[00:03:21] that has a very similar pattern is that we had this feature extraction engineer
[00:03:24] we had this feature extraction engineer who was responsible for getting from the
[00:03:29] who was responsible for getting from the input to a set of features that a
[00:03:31] input to a set of features that a classifier can understand and the deep
[00:03:33] classifier can understand and the deep learning paradigm is to combine feature
[00:03:35] learning paradigm is to combine feature extraction and the classification into
[00:03:38] extraction and the classification into one step by automatically extracting
[00:03:41] one step by automatically extracting features which is cool here's what I
[00:03:43] features which is cool here's what I think will be the next paradigm shift
[00:03:46] think will be the next paradigm shift for AI in healthcare but also more
[00:03:49] for AI in healthcare but also more generally is we still have a deep
[00:03:53] generally is we still have a deep learning engineer up here that's you
[00:03:55] learning engineer up here that's you that's me that are designing the
[00:03:58] that's me that are designing the networks that are making decisions like
[00:03:59] networks that are making decisions like a convolutional neural network is the
[00:04:01] a convolutional neural network is the best architecture for this problem the
[00:04:04] best architecture for this problem the specific type of architecture there's an
[00:04:06] specific type of architecture there's an RN and CN n and whatever NN you can
[00:04:09] RN and CN n and whatever NN you can throw on there but what if we could just
[00:04:12] throw on there but what if we could just replace out the ml engineer as well and
[00:04:17] replace out the ml engineer as well and I find this quite funny because everyone
[00:04:19] I find this quite funny because everyone you know in AI for healthcare question
[00:04:21] you know in AI for healthcare question that I get asked a lot is are we going
[00:04:23] that I get asked a lot is are we going to replace doctors with all these AI
[00:04:25] to replace doctors with all these AI solutions and nobody actually realizes
[00:04:28] solutions and nobody actually realizes that we might replace machine learning
[00:04:30] that we might replace machine learning engineers faster than we might replace
[00:04:33] engineers faster than we might replace doctors of this earth this to be the
[00:04:35] doctors of this earth this to be the case
[00:04:36] case a lot of research is developing
[00:04:38] a lot of research is developing algorithms that can automatically learn
[00:04:40] algorithms that can automatically learn architecture some of what you might go
[00:04:42] architecture some of what you might go through in this class great so that's
[00:04:46] through in this class great so that's the general overview now I want to talk
[00:04:48] the general overview now I want to talk about three case studies in the lab of
[00:04:51] about three case studies in the lab of AI being applied to different problems
[00:04:54] AI being applied to different problems and because healthcare is so broad I
[00:04:56] and because healthcare is so broad I thought I'd focus in on one narrow
[00:04:59] thought I'd focus in on one narrow vertical and let us go deep on that and
[00:05:02] vertical and let us go deep on that and that's medical imaging so I've chosen
[00:05:05] that's medical imaging so I've chosen three problems and one of them's a 1b
[00:05:10] three problems and one of them's a 1b problem the second is a 2d problem as
[00:05:13] problem the second is a 2d problem as and the third is a-- is it 3d problem so
[00:05:16] and the third is a-- is it 3d problem so I thought we could we can walk through
[00:05:17] I thought we could we can walk through all the different kinds of data here so
[00:05:22] all the different kinds of data here so this is some work that was done early
[00:05:24] this is some work that was done early last year in the lab where we showed
[00:05:27] last year in the lab where we showed that we were able to detect arrhythmias
[00:05:29] that we were able to detect arrhythmias at the level of cardiologists so
[00:05:34] at the level of cardiologists so arrhythmias are an important problem
[00:05:35] arrhythmias are an important problem affect millions of people this has
[00:05:38] affect millions of people this has especially come to light recently with
[00:05:40] especially come to light recently with devices like the Apple watch which now
[00:05:43] devices like the Apple watch which now have a ECG monitoring and the thing
[00:05:49] have a ECG monitoring and the thing about this is that sometimes you might
[00:05:51] about this is that sometimes you might have symptoms and know that you have
[00:05:53] have symptoms and know that you have arrhythmias but other times you may not
[00:05:56] arrhythmias but other times you may not have symptoms and still have arrhythmias
[00:05:59] have symptoms and still have arrhythmias that can be addressed with with if if
[00:06:03] that can be addressed with with if if you were to do an ECG and the ECGs test
[00:06:07] you were to do an ECG and the ECGs test is basically showing the heart's
[00:06:08] is basically showing the heart's electrical activity over time the
[00:06:10] electrical activity over time the electrodes are attached the skin-safe
[00:06:13] electrodes are attached the skin-safe tests and it takes over a few minutes
[00:06:15] tests and it takes over a few minutes and this is what it looks like when
[00:06:17] and this is what it looks like when you're hooked up to all the different
[00:06:18] you're hooked up to all the different electrodes so this test is often done
[00:06:22] electrodes so this test is often done for a few minutes in the hospital and
[00:06:25] for a few minutes in the hospital and the finding is basically that in a few
[00:06:28] the finding is basically that in a few minutes you can't really capture a
[00:06:30] minutes you can't really capture a person's of normal heart rhythms so
[00:06:33] person's of normal heart rhythms so let's send them home for 24 to 48 hours
[00:06:36] let's send them home for 24 to 48 hours with a holter monitor and let's see what
[00:06:38] with a holter monitor and let's see what we can find there are more recent
[00:06:41] we can find there are more recent devices such as the Zeo patch which let
[00:06:44] devices such as the Zeo patch which let let patients be monitored for up to two
[00:06:47] let patients be monitored for up to two weeks and it's it's quite
[00:06:49] weeks and it's it's quite convenient you can use it in the shower
[00:06:51] convenient you can use it in the shower or while you're sleeping so you really
[00:06:53] or while you're sleeping so you really can capture a lot of what what's
[00:06:56] can capture a lot of what what's happening in the hearts ECG activity but
[00:07:02] happening in the hearts ECG activity but if we look at the amount of data that's
[00:07:04] if we look at the amount of data that's generated in two weeks it's one point
[00:07:06] generated in two weeks it's one point six million heartbeats that's a lot and
[00:07:09] six million heartbeats that's a lot and there are very few doctors who'd be
[00:07:12] there are very few doctors who'd be willing to go through two weeks of ECG
[00:07:14] willing to go through two weeks of ECG reading for each of their patients and
[00:07:16] reading for each of their patients and this really motivates why we need
[00:07:18] this really motivates why we need automated interpretation here but
[00:07:22] automated interpretation here but automated detection comes with a
[00:07:24] automated detection comes with a challenges one of them is you know you
[00:07:28] challenges one of them is you know you have in the hospital several electrodes
[00:07:30] have in the hospital several electrodes and in more recent devices we have just
[00:07:33] and in more recent devices we have just one and the way one can think of several
[00:07:37] one and the way one can think of several electrodes is sort of the electrical
[00:07:40] electrodes is sort of the electrical activity of the heart is 3d and each one
[00:07:44] activity of the heart is 3d and each one of the electrodes is giving a different
[00:07:45] of the electrodes is giving a different 2d perspective into the 3d perspective
[00:07:49] 2d perspective into the 3d perspective but now that we have only one lead we
[00:07:52] but now that we have only one lead we only have one of these perspectives
[00:07:53] only have one of these perspectives available and the second one is that the
[00:07:57] available and the second one is that the differences between the heart rhythms
[00:07:58] differences between the heart rhythms are very subtle this is what a cardiac
[00:08:02] are very subtle this is what a cardiac cycle looks like and when we're looking
[00:08:04] cycle looks like and when we're looking at arrhythmias or normal heart rhythms
[00:08:08] at arrhythmias or normal heart rhythms one's going to look at the sub
[00:08:11] one's going to look at the sub structures within the cycle and then
[00:08:14] structures within the cycle and then this the structure between cycles as
[00:08:17] this the structure between cycles as well and the differences are quite
[00:08:20] well and the differences are quite subtle so when we started working on
[00:08:26] subtle so when we started working on this problem oh maybe I should share
[00:08:29] this problem oh maybe I should share this story so we started working on this
[00:08:31] this story so we started working on this problem and then it was me my my
[00:08:34] problem and then it was me my my collaborator on e and and professor Inge
[00:08:37] collaborator on e and and professor Inge and one of the things that he that he
[00:08:40] and one of the things that he that he mentioned we should do he said let's
[00:08:42] mentioned we should do he said let's just go out and read ECG books and let's
[00:08:44] just go out and read ECG books and let's do the exercises and if you're in med
[00:08:46] do the exercises and if you're in med school they're these books where where
[00:08:49] school they're these books where where you can where you can learn about ECG
[00:08:51] you can where you can learn about ECG interpretation and then there are
[00:08:53] interpretation and then there are several exercises that you can do to
[00:08:55] several exercises that you can do to test yourselves so I went to the med
[00:08:57] test yourselves so I went to the med school library you know they have those
[00:09:00] school library you know they have those hand crank
[00:09:02] hand crank shouts at the bottom see if to move them
[00:09:04] shouts at the bottom see if to move them and then grab my book and then we went
[00:09:06] and then grab my book and then we went for two weeks and did learn sir did go
[00:09:10] for two weeks and did learn sir did go through two books and learn ECG
[00:09:12] through two books and learn ECG interpretation and it was pretty
[00:09:13] interpretation and it was pretty challenging and if we looked at previous
[00:09:18] challenging and if we looked at previous literature to this I think they were
[00:09:20] literature to this I think they were sort of drawing upon some domain
[00:09:23] sort of drawing upon some domain knowledge er in that here we're looking
[00:09:25] knowledge er in that here we're looking at waves how can we extract specific
[00:09:27] at waves how can we extract specific features from waves that doctors are
[00:09:30] features from waves that doctors are also looking at so there was a lot of
[00:09:32] also looking at so there was a lot of feature engineering going on and if
[00:09:34] feature engineering going on and if you're familiar with wavelet transforms
[00:09:36] you're familiar with wavelet transforms they were this sort of they were the
[00:09:39] they were this sort of they were the most common approach with a lot of sort
[00:09:43] most common approach with a lot of sort of like different mother wavelets etc
[00:09:45] of like different mother wavelets etc etc pre-processing bandpass filters so
[00:09:48] etc pre-processing bandpass filters so everything you can imagine doing what
[00:09:50] everything you can imagine doing what signals was done and then you fed it
[00:09:52] signals was done and then you fed it into your SVM and you called it a day
[00:09:54] into your SVM and you called it a day now with deep learning we can change
[00:09:57] now with deep learning we can change things up a bit
[00:09:58] things up a bit so on the Left we have an ECG signal and
[00:10:01] so on the Left we have an ECG signal and on the right is just three heart rhythms
[00:10:05] on the right is just three heart rhythms we're gonna call them a B and C and
[00:10:07] we're gonna call them a B and C and we're gonna learn the mapping to go
[00:10:09] we're gonna learn the mapping to go straight from the input to the output
[00:10:11] straight from the input to the output and here's how we're gonna break it out
[00:10:15] and here's how we're gonna break it out we're gonna say that every label labels
[00:10:19] we're gonna say that every label labels the same amount of the signal so if we
[00:10:22] the same amount of the signal so if we had four labels and the ECG would be
[00:10:25] had four labels and the ECG would be split into these four sort of this
[00:10:28] split into these four sort of this rhythm is labeling this part and then
[00:10:32] rhythm is labeling this part and then we're going to use a deep neural network
[00:10:35] we're going to use a deep neural network so we've built a 1d convolutional neural
[00:10:39] so we've built a 1d convolutional neural network which runs over the time
[00:10:43] network which runs over the time dimension of the input because remember
[00:10:44] dimension of the input because remember we're getting one scalar over over time
[00:10:48] we're getting one scalar over over time and then this architecture is 34 layers
[00:10:51] and then this architecture is 34 layers deep so I thought I'd talk a little bit
[00:10:54] deep so I thought I'd talk a little bit about the architecture have you seen
[00:10:57] about the architecture have you seen resonance before okay so should I go
[00:11:03] resonance before okay so should I go into this okay cool here's my 1-minute
[00:11:10] into this okay cool here's my 1-minute spiel of ResNet then is that your going
[00:11:12] spiel of ResNet then is that your going deeper in terms of the number of layers
[00:11:15] deeper in terms of the number of layers that your
[00:11:16] that your having in a network you should be able
[00:11:19] having in a network you should be able to represent a larger set of functions
[00:11:22] to represent a larger set of functions but when we look at the training error
[00:11:24] but when we look at the training error for these very deep networks what we
[00:11:27] for these very deep networks what we find is that it's worse than a smaller
[00:11:30] find is that it's worse than a smaller network now this is not the validation
[00:11:32] network now this is not the validation error this is the training error that
[00:11:34] error this is the training error that means even with the ability to represent
[00:11:36] means even with the ability to represent a more complex function we aren't able
[00:11:39] a more complex function we aren't able to represent the training data so the
[00:11:44] to represent the training data so the motivating idea of residual networks is
[00:11:47] motivating idea of residual networks is to say hey let's add shortcuts within
[00:11:50] to say hey let's add shortcuts within network so as to minimize the distance
[00:11:53] network so as to minimize the distance from the error signal to each of my
[00:11:55] from the error signal to each of my layers this is just math to say the same
[00:12:01] layers this is just math to say the same thing so further work on ResNet showed
[00:12:05] thing so further work on ResNet showed that ok we have the shortcut connection
[00:12:08] that ok we have the shortcut connection how should we make information flow
[00:12:10] how should we make information flow through it the best and the finding was
[00:12:15] through it the best and the finding was basically that anything you you add to
[00:12:18] basically that anything you you add to the shortcut to the highway think of
[00:12:20] the shortcut to the highway think of these as stop signs or or or signals on
[00:12:25] these as stop signs or or or signals on a highway and it's basically saying the
[00:12:27] a highway and it's basically saying the fastest way on the highways to not have
[00:12:29] fastest way on the highways to not have anything but addition on it and then
[00:12:35] anything but addition on it and then there were a few advancements on top of
[00:12:38] there were a few advancements on top of that like adding dropout and increasing
[00:12:42] that like adding dropout and increasing the number of filters in the
[00:12:43] the number of filters in the convolutional neural network that we
[00:12:47] convolutional neural network that we also added to this network okay so
[00:12:50] also added to this network okay so that's the convolutional neural network
[00:12:52] that's the convolutional neural network let's talk a little bit about data so
[00:12:56] let's talk a little bit about data so one thing that was cool about this
[00:12:57] one thing that was cool about this project was that we got to partner up
[00:13:00] project was that we got to partner up with a with a startup that manufactures
[00:13:05] with a with a startup that manufactures these hardware patches and we got data
[00:13:08] these hardware patches and we got data off of patients who were wearing these
[00:13:10] off of patients who were wearing these patches for up to two weeks and this was
[00:13:15] patches for up to two weeks and this was from around 30,000 patients and this is
[00:13:19] from around 30,000 patients and this is 600 times bigger than the largest data
[00:13:21] 600 times bigger than the largest data set that that was out there before and
[00:13:25] set that that was out there before and for each of these ECG signals what
[00:13:28] for each of these ECG signals what happened
[00:13:29] happened is that each of them is annotated by a
[00:13:32] is that each of them is annotated by a clinical ECG expert who says here's
[00:13:35] clinical ECG expert who says here's where rhythm a starts and here's where
[00:13:37] where rhythm a starts and here's where ends so let's mark the whole ECG that
[00:13:39] ends so let's mark the whole ECG that way obviously very time-intensive
[00:13:41] way obviously very time-intensive but a good data source and then we had a
[00:13:44] but a good data source and then we had a test set as well and here we use here we
[00:13:48] test set as well and here we use here we use a committee of cardiologists so
[00:13:51] use a committee of cardiologists so they'd get together sit in a room and
[00:13:53] they'd get together sit in a room and decide ok we disagree on the specific
[00:13:56] decide ok we disagree on the specific point let's try to let's try to discuss
[00:13:58] point let's try to let's try to discuss which one of us is right or what this
[00:14:00] which one of us is right or what this rhythm actually is so they arrive at a
[00:14:02] rhythm actually is so they arrive at a ground truth after discussion and then
[00:14:06] ground truth after discussion and then we can of course test cardiologists as
[00:14:07] we can of course test cardiologists as well and the way we do this is we have
[00:14:09] well and the way we do this is we have them do it individually so this is not
[00:14:11] them do it individually so this is not the same set that did the ground truth
[00:14:13] the same set that did the ground truth there's a different set of cardiologists
[00:14:15] there's a different set of cardiologists coming in one at a time you tell me
[00:14:17] coming in one at a time you tell me what's going on here and we're going to
[00:14:18] what's going on here and we're going to test you so when we compared the
[00:14:22] test you so when we compared the performance of our algorithm to
[00:14:25] performance of our algorithm to cardiologists we found that we were able
[00:14:28] cardiologists we found that we were able to surpass them on the f1 metrics so
[00:14:32] to surpass them on the f1 metrics so this is precision and recall and when we
[00:14:36] this is precision and recall and when we looked at where the mistakes were made
[00:14:39] looked at where the mistakes were made we can see that sort of the the biggest
[00:14:43] we can see that sort of the the biggest mistake was between distinguishing two
[00:14:46] mistake was between distinguishing two rhythms which look very very similar but
[00:14:49] rhythms which look very very similar but actually don't have a difference in in
[00:14:52] actually don't have a difference in in in treatment here's another case where
[00:14:55] in treatment here's another case where the model is not making a mistake which
[00:14:57] the model is not making a mistake which the experts are making and turns out
[00:15:00] the experts are making and turns out this is a costly mistake this is saying
[00:15:02] this is a costly mistake this is saying a benign heart rhythm or what experts
[00:15:06] a benign heart rhythm or what experts thought was a benign heart rhythm was
[00:15:07] thought was a benign heart rhythm was actually a pretty serious one so that's
[00:15:11] actually a pretty serious one so that's that's one beauty of automation is that
[00:15:15] that's one beauty of automation is that we're able to catch these catch these
[00:15:18] we're able to catch these catch these misdiagnosis
[00:15:21] here are three hard blocks which are
[00:15:25] here are three hard blocks which are clinically irrelevant to catch on which
[00:15:26] clinically irrelevant to catch on which the model outperforms the experts and on
[00:15:30] the model outperforms the experts and on atrial fibrillation which is probably
[00:15:32] atrial fibrillation which is probably the most common serious arrhythmia the
[00:15:34] the most common serious arrhythmia the same pulse
[00:15:39] one of the things that's neat about this
[00:15:42] one of the things that's neat about this application a lot of applications in
[00:15:44] application a lot of applications in health care is what automation with deep
[00:15:48] health care is what automation with deep learning machine learning enables is for
[00:15:50] learning machine learning enables is for us to be able to continuously monitor
[00:15:52] us to be able to continuously monitor patients and this is not something we've
[00:15:54] patients and this is not something we've been able to do before so a lot of even
[00:15:57] been able to do before so a lot of even science of understanding how patients
[00:16:00] science of understanding how patients risk factors what they are or how they
[00:16:04] risk factors what they are or how they change hasn't been done before and this
[00:16:06] change hasn't been done before and this is an exciting opportunity to be able to
[00:16:09] is an exciting opportunity to be able to advance science as well and the Apple
[00:16:14] advance science as well and the Apple watch has recently released their their
[00:16:18] watch has recently released their their ECG monitoring and it'll be exciting to
[00:16:21] ECG monitoring and it'll be exciting to see what new things we can find out
[00:16:24] see what new things we can find out about at the health of our hearts from
[00:16:26] about at the health of our hearts from from these inventions okay so that was
[00:16:31] from these inventions okay so that was our first question yeah so repeat the
[00:16:42] our first question yeah so repeat the question how was it to to sort of deal
[00:16:50] question how was it to to sort of deal with data privacy and sort of keep
[00:16:51] with data privacy and sort of keep patients information private so in in
[00:16:55] patients information private so in in this case we did not have we had
[00:16:57] this case we did not have we had completely de-identified data so it was
[00:16:59] completely de-identified data so it was just some was ECG signal without any
[00:17:02] just some was ECG signal without any extra information about their their
[00:17:05] extra information about their their clinical records or anything like that
[00:17:07] clinical records or anything like that so it's it's very it's very you have
[00:17:23] so it's it's very it's very you have like it like signed off that's credible
[00:17:24] like it like signed off that's credible Authority you're like total hospitals
[00:17:26] Authority you're like total hospitals I'm getting that to be oh sure and I
[00:17:29] I'm getting that to be oh sure and I think we can we can take this question
[00:17:30] think we can we can take this question offline as well but one of the beauties
[00:17:32] offline as well but one of the beauties of working at Stanford is that there's a
[00:17:35] of working at Stanford is that there's a lot of Industry research collaborations
[00:17:37] lot of Industry research collaborations and we have great infrastructure to be
[00:17:40] and we have great infrastructure to be able to work with that so which brings
[00:17:44] able to work with that so which brings me on to my second case study sorry yeah
[00:17:47] me on to my second case study sorry yeah go for it
[00:18:05] that's a good question so just to repeat
[00:18:08] that's a good question so just to repeat the question how did we define a gold
[00:18:10] the question how did we define a gold standard when we have experts setting
[00:18:13] standard when we have experts setting the gold standard so here's how we did
[00:18:16] the gold standard so here's how we did it so one one way to come up with the
[00:18:18] it so one one way to come up with the gold standard is to say okay if we
[00:18:20] gold standard is to say okay if we looked at what a consensus would say
[00:18:23] looked at what a consensus would say what would they say and so we got three
[00:18:26] what would they say and so we got three cardiologists in a room to set the gold
[00:18:29] cardiologists in a room to set the gold standard and then to compare the
[00:18:31] standard and then to compare the performance of experts these were
[00:18:33] performance of experts these were individuals who were separate from those
[00:18:35] individuals who were separate from those groups of cardiologists who sat in
[00:18:37] groups of cardiologists who sat in another room and said what they thought
[00:18:39] another room and said what they thought of the of the ECG signals that way
[00:18:43] of the of the ECG signals that way there's there's some disagreement where
[00:18:46] there's there's some disagreement where the gold standard is set by the
[00:18:48] the gold standard is set by the committee great so here we looked at how
[00:18:56] committee great so here we looked at how we can detect pneumonia off of chest
[00:19:00] we can detect pneumonia off of chest x-rays so pneumonia is an infection that
[00:19:03] x-rays so pneumonia is an infection that affects millions in the u.s. it's big
[00:19:08] affects millions in the u.s. it's big global burden is actually in in kids so
[00:19:13] global burden is actually in in kids so that's where it's really useful to be
[00:19:15] that's where it's really useful to be able to detect that automatically and
[00:19:18] able to detect that automatically and well so to detect pneumonia there's a
[00:19:22] well so to detect pneumonia there's a chest x-ray exam and chest x-rays are
[00:19:27] chest x-ray exam and chest x-rays are the most common imaging procedure with
[00:19:31] the most common imaging procedure with two billion chest x-rays done per year
[00:19:33] two billion chest x-rays done per year and the way of normalities are detected
[00:19:37] and the way of normalities are detected in chest x-rays is they present as areas
[00:19:40] in chest x-rays is they present as areas of increased density so where things
[00:19:44] of increased density so where things should appear dark they appear brighter
[00:19:46] should appear dark they appear brighter or vice-versa
[00:19:47] or vice-versa and here's what characteristically
[00:19:52] and here's what characteristically pneumonia looks like where it's like a
[00:19:54] pneumonia looks like where it's like a fluffy cloud but this is an
[00:19:57] fluffy cloud but this is an oversimplification of course because
[00:20:00] oversimplification of course because pneumonia is when the alveoli fill up
[00:20:02] pneumonia is when the alveoli fill up with pus
[00:20:03] with pus the alveoli can fill up with a lot of
[00:20:05] the alveoli can fill up with a lot of other things as well which lead to very
[00:20:07] other things as well which lead to very different interpretations and diagnosis
[00:20:09] different interpretations and diagnosis for the patients and treatment for the
[00:20:11] for the patients and treatment for the patient so it's quite confusing
[00:20:14] patient so it's quite confusing which is why radiologists trained for
[00:20:16] which is why radiologists trained for years to be able to do this
[00:20:19] the setup is we'll take an input image
[00:20:22] the setup is we'll take an input image of someone's chest x-ray and output the
[00:20:25] of someone's chest x-ray and output the binary label 0 1 which indicates the
[00:20:29] binary label 0 1 which indicates the presence or the absence of pneumonia and
[00:20:32] presence or the absence of pneumonia and here we use a 2d convolutional neural
[00:20:35] here we use a 2d convolutional neural network which is pre pre trained on
[00:20:37] network which is pre pre trained on imagenet ok so we looked at short good
[00:20:41] imagenet ok so we looked at short good connections earlier and dense Nets had
[00:20:46] connections earlier and dense Nets had this idea to take short cut connections
[00:20:51] this idea to take short cut connections to the extreme it says what happens if
[00:20:54] to the extreme it says what happens if we connect every layer to every other
[00:20:56] we connect every layer to every other layer instead of just connecting sort of
[00:20:59] layer instead of just connecting sort of one instead of having just one short cut
[00:21:01] one instead of having just one short cut which is what ResNet had and dense net
[00:21:07] which is what ResNet had and dense net beat the previous state of the art and
[00:21:10] beat the previous state of the art and has generally lower error and fewer
[00:21:13] has generally lower error and fewer parameters on the image net challenge so
[00:21:17] parameters on the image net challenge so that's what we used for the data set
[00:21:20] that's what we used for the data set when we started working on this project
[00:21:23] when we started working on this project which was around October of last year
[00:21:28] which was around October of last year there was this large data set that was
[00:21:30] there was this large data set that was released by the NIH hundred thousand
[00:21:34] released by the NIH hundred thousand chest x-rays and this was the largest
[00:21:37] chest x-rays and this was the largest public data set at the time and here
[00:21:41] public data set at the time and here each x-ray is annotated with up to 14
[00:21:44] each x-ray is annotated with up to 14 different pathologies and the way this
[00:21:46] different pathologies and the way this annotation works is there's an NLP
[00:21:48] annotation works is there's an NLP system which reads a report and then
[00:21:51] system which reads a report and then outputs for each of several pathologies
[00:21:54] outputs for each of several pathologies whether there is a mention whether there
[00:21:57] whether there is a mention whether there is a negation like not pneumonia for
[00:22:00] is a negation like not pneumonia for instance and then annotates accordingly
[00:22:03] instance and then annotates accordingly and then for a test set we had four
[00:22:07] and then for a test set we had four radiologists here at Stanford
[00:22:08] radiologists here at Stanford independently annotate and tell us what
[00:22:11] independently annotate and tell us what they thought was going on in those
[00:22:13] they thought was going on in those x-rays so one of the questions
[00:22:16] x-rays so one of the questions that comes up often in medical imaging
[00:22:19] that comes up often in medical imaging is we have we have a model we have
[00:22:25] is we have we have a model we have several experts but we don't really have
[00:22:28] several experts but we don't really have a ground truth and we don't have a
[00:22:30] a ground truth and we don't have a ground truth for several reasons
[00:22:31] ground truth for several reasons sometimes one of them is just that it's
[00:22:34] sometimes one of them is just that it's difficult to tell whether someone had
[00:22:36] difficult to tell whether someone had pneumonia or not without additional
[00:22:40] pneumonia or not without additional information like their clinical record
[00:22:42] information like their clinical record or even once you gave them and to be
[00:22:46] or even once you gave them and to be antibiotics did they get treated so
[00:22:49] antibiotics did they get treated so really one way to evaluate whether a
[00:22:53] really one way to evaluate whether a model is better than a radiologists or
[00:22:57] model is better than a radiologists or as well as doing as well as the
[00:22:59] as well as doing as well as the radiologist is by saying do they agree
[00:23:01] radiologist is by saying do they agree with other experts similarly so that's
[00:23:06] with other experts similarly so that's what we use here does the idea we say
[00:23:08] what we use here does the idea we say okay let's have one of the radiologists
[00:23:11] okay let's have one of the radiologists be the the the prediction model we're
[00:23:16] be the the the prediction model we're evaluating and let's set another
[00:23:19] evaluating and let's set another radiologist to be ground truth and now
[00:23:22] radiologist to be ground truth and now we're going to compute the f1 score once
[00:23:25] we're going to compute the f1 score once change the ground truth do it the second
[00:23:28] change the ground truth do it the second time change it again third and then also
[00:23:31] time change it again third and then also use the model as the ground truth and do
[00:23:33] use the model as the ground truth and do it again and we can use a very symmetric
[00:23:35] it again and we can use a very symmetric evaluation scheme but this time having
[00:23:38] evaluation scheme but this time having the model be evaluated against each of
[00:23:41] the model be evaluated against each of the four experts so we do that and then
[00:23:45] the four experts so we do that and then we get a score for both of them well for
[00:23:48] we get a score for both of them well for all of the experts and for the model and
[00:23:49] all of the experts and for the model and we showed in our work that we were able
[00:23:52] we showed in our work that we were able to do better than the average
[00:23:55] to do better than the average radiologist at this task two ways to
[00:24:01] radiologist at this task two ways to extend this in the future is to be able
[00:24:03] extend this in the future is to be able to look at patient history as well and
[00:24:06] to look at patient history as well and look at lateral radiographs and be able
[00:24:10] look at lateral radiographs and be able to improve upon this diagnosis at the
[00:24:14] to improve upon this diagnosis at the time at which we released our work on
[00:24:17] time at which we released our work on all 14 pathologies we were able to
[00:24:21] all 14 pathologies we were able to outperform the previous state-of-the-art
[00:24:25] outperform the previous state-of-the-art okay so model interpretation model
[00:24:28] okay so model interpretation model interpretations
[00:24:29] interpretations yes this question that you had a future
[00:24:33] yes this question that you had a future work so almost like okay so if you have
[00:24:36] work so almost like okay so if you have pneumonia you present into the stalker
[00:24:39] pneumonia you present into the stalker without like fever to talk to here with
[00:24:42] without like fever to talk to here with syrup and coffee too much actually
[00:24:44] syrup and coffee too much actually although that's not included at the
[00:24:45] although that's not included at the model so thank you my question is that
[00:24:47] model so thank you my question is that if you're going to a dataset you're
[00:24:49] if you're going to a dataset you're trying to determine does this person
[00:24:51] trying to determine does this person have pneumonia or that like that's one
[00:24:54] have pneumonia or that like that's one thing but you don't sound not that you
[00:24:56] thing but you don't sound not that you just don't have that data but you're not
[00:24:58] just don't have that data but you're not looking at other differences let's take
[00:24:59] looking at other differences let's take this out that's not cancer resemblance
[00:25:01] this out that's not cancer resemblance to have like job other longer because
[00:25:04] to have like job other longer because he'll call grass justice it's all those
[00:25:07] he'll call grass justice it's all those are images that you're not really
[00:25:11] are images that you're not really looking at so let's say and that's been
[00:25:12] looking at so let's say and that's been a tough situation so the obvious
[00:25:14] a tough situation so the obvious situation doesn't really give you much
[00:25:15] situation doesn't really give you much to it right but the tough situation is
[00:25:17] to it right but the tough situation is you get a patient that has a fever
[00:25:19] you get a patient that has a fever it's coffee violently cancer or
[00:25:22] it's coffee violently cancer or pneumonia or black lung disease then how
[00:25:25] pneumonia or black lung disease then how do you how do you get your operative
[00:25:27] do you how do you get your operative working that condition and also if
[00:25:29] working that condition and also if you're not including all those other
[00:25:31] you're not including all those other cases then it's not just that what's the
[00:25:33] cases then it's not just that what's the use of it but like you'd only say yeah
[00:25:36] use of it but like you'd only say yeah and this alone so I'm trying to keep it
[00:25:38] and this alone so I'm trying to keep it at this technical to the technical class
[00:25:40] at this technical to the technical class what's and is there a neural network
[00:25:42] what's and is there a neural network architecture that you would use to be
[00:25:44] architecture that you would use to be able to solve a number one it's a multi
[00:25:47] able to solve a number one it's a multi task learning isn't it like sure sure
[00:25:49] task learning isn't it like sure sure okay so let me try to boil those sort of
[00:25:51] okay so let me try to boil those sort of sets of questions down so one is
[00:25:54] sets of questions down so one is patients are coming in we're not getting
[00:25:56] patients are coming in we're not getting access to their clinical histories so
[00:25:59] access to their clinical histories so how are we able to make this
[00:26:01] how are we able to make this determination at all so one thing is
[00:26:03] determination at all so one thing is that when we're training the algorithm
[00:26:04] that when we're training the algorithm we're training the algorithm on on
[00:26:08] we're training the algorithm on on pathologies extracted from radiology
[00:26:12] pathologies extracted from radiology reports and these radiology reports are
[00:26:15] reports and these radiology reports are written with understanding of full
[00:26:17] written with understanding of full clinical history and understanding of
[00:26:21] clinical history and understanding of sort of what the patient presented with
[00:26:24] sort of what the patient presented with in terms of symptoms as well so we're
[00:26:26] in terms of symptoms as well so we're training the model on on these radiology
[00:26:30] training the model on on these radiology reports which had access to more
[00:26:32] reports which had access to more information and the second is that the
[00:26:35] information and the second is that the utility of this is not as much in being
[00:26:39] utility of this is not as much in being able to compare a patient's x-rays day
[00:26:42] able to compare a patient's x-rays day to day as much as
[00:26:44] to day as much as here is a new patient with a set of
[00:26:46] here is a new patient with a set of symptoms and can we identify things from
[00:26:49] symptoms and can we identify things from their chest x-rays which brings us to
[00:26:55] their chest x-rays which brings us to model interpretation so if you were a
[00:26:59] model interpretation so if you were a end-user for model oh I so when I was
[00:27:05] end-user for model oh I so when I was back in undergrad and I was in the lab
[00:27:08] back in undergrad and I was in the lab we were working on autonomous cars and I
[00:27:12] we were working on autonomous cars and I thought about this a lot how many of you
[00:27:13] thought about this a lot how many of you have been in an autonomous car how many
[00:27:18] have been in an autonomous car how many if you would trust being an autonomous
[00:27:21] if you would trust being an autonomous car yeah I thought about this as well
[00:27:29] car yeah I thought about this as well would I trust being an autonomous car
[00:27:31] would I trust being an autonomous car and I thought it'd be pretty sweet if
[00:27:33] and I thought it'd be pretty sweet if the algorithm that was that was in the
[00:27:37] the algorithm that was that was in the car would tell me whatever decision it
[00:27:39] car would tell me whatever decision it was going to make in advance I know
[00:27:41] was going to make in advance I know that's not possible at high speeds so
[00:27:44] that's not possible at high speeds so that you know just in case I disagreed
[00:27:46] that you know just in case I disagreed with a particular decision I could say
[00:27:48] with a particular decision I could say no abort and and half the model sort of
[00:27:53] no abort and and half the model sort of you know remake its decision and I think
[00:27:57] you know remake its decision and I think the same holds true in healthcare as
[00:27:59] the same holds true in healthcare as well the one advantage that happens in
[00:28:01] well the one advantage that happens in healthcare is rather than having to make
[00:28:03] healthcare is rather than having to make decisions within seconds like in the
[00:28:05] decisions within seconds like in the case of autonomous car there is often a
[00:28:08] case of autonomous car there is often a larger time frame like minutes or hours
[00:28:10] larger time frame like minutes or hours that we have and and here it's it's
[00:28:16] that we have and and here it's it's useful to be able to inform the
[00:28:18] useful to be able to inform the clinician that's treating the patient to
[00:28:20] clinician that's treating the patient to say hey here's what my model thought and
[00:28:23] say hey here's what my model thought and why so here's the technique we use for
[00:28:29] why so here's the technique we use for that class activation maps which you may
[00:28:31] that class activation maps which you may cover in another lecture
[00:28:34] cover in another lecture so I'll just I'll just leave it at
[00:28:36] so I'll just I'll just leave it at saying that there are ways of being able
[00:28:39] saying that there are ways of being able to look at what parts of the image are
[00:28:41] to look at what parts of the image are most evident of a particular pathology
[00:28:45] most evident of a particular pathology to generate these these heat maps so
[00:28:50] to generate these these heat maps so here's a heat map that's generated for
[00:28:52] here's a heat map that's generated for pneumonia so this x-ray has pneumonia
[00:28:55] pneumonia so this x-ray has pneumonia and I can and
[00:28:56] and I can and and and the algorithm in red is able to
[00:29:01] and and the algorithm in red is able to highlight the areas where it thought was
[00:29:03] highlight the areas where it thought was most problematic for that here's one in
[00:29:07] most problematic for that here's one in which it's able to do a collapsed right
[00:29:11] which it's able to do a collapsed right lung here's one in which Abel is able to
[00:29:14] lung here's one in which Abel is able to find a small cancer and here the goal is
[00:29:20] find a small cancer and here the goal is to be able to improve healthcare
[00:29:22] to be able to improve healthcare delivery where in the developed world
[00:29:26] delivery where in the developed world one of the things that it's useful for
[00:29:28] one of the things that it's useful for is to be able to prioritize the workflow
[00:29:31] is to be able to prioritize the workflow make sure the radiologists are getting
[00:29:33] make sure the radiologists are getting to the patients most in need of care
[00:29:35] to the patients most in need of care before once who's x-rays look more
[00:29:38] before once who's x-rays look more normal but the second part which I'm
[00:29:42] normal but the second part which I'm quite excited about is to increase the
[00:29:44] quite excited about is to increase the access of medical imaging expertise
[00:29:47] access of medical imaging expertise globally where right now the World
[00:29:50] globally where right now the World Health Organization estimates that about
[00:29:51] Health Organization estimates that about two-thirds of the world's population
[00:29:53] two-thirds of the world's population does not have access to Diagnostics and
[00:29:58] does not have access to Diagnostics and so we thought hey wouldn't it be cool if
[00:30:01] so we thought hey wouldn't it be cool if we just made an app that was able to
[00:30:05] we just made an app that was able to allow users to upload images of their
[00:30:09] allow users to upload images of their off x-rays and be able to give its
[00:30:14] off x-rays and be able to give its diagnosis so this is still in the works
[00:30:17] diagnosis so this is still in the works so I'll show you what we've got running
[00:30:20] so I'll show you what we've got running locally and so here I'm presented with a
[00:30:25] locally and so here I'm presented with a screen that asked me to upload an x-ray
[00:30:28] screen that asked me to upload an x-ray and so I have I have several x-rays here
[00:30:33] and so I have I have several x-rays here and I'm gonna pick the one that says
[00:30:36] and I'm gonna pick the one that says cardiomegaly so cardiomegaly refers to
[00:30:39] cardiomegaly so cardiomegaly refers to the enlargement of the heart so I
[00:30:44] the enlargement of the heart so I uploaded it now it's running the models
[00:30:46] uploaded it now it's running the models running in the backend and within a
[00:30:48] running in the backend and within a couple of seconds its outputted its
[00:30:50] couple of seconds its outputted its diagnosis on the right so you'll see the
[00:30:54] diagnosis on the right so you'll see the 14 pathologies that the model is trained
[00:30:57] 14 pathologies that the model is trained on being listed and then next to them a
[00:31:00] on being listed and then next to them a bar and at the top of this list is
[00:31:03] bar and at the top of this list is cardiomegaly which is what this patient
[00:31:08] cardiomegaly which is what this patient has the hardest sort of
[00:31:10] has the hardest sort of out and if I hover on cardiomegaly I can
[00:31:15] out and if I hover on cardiomegaly I can see that the probability is displayed on
[00:31:18] see that the probability is displayed on there and that we talked about
[00:31:21] there and that we talked about interpretation how do I believe that
[00:31:22] interpretation how do I believe that this model is actually looking at the
[00:31:24] this model is actually looking at the heart rather than looking at something
[00:31:26] heart rather than looking at something else and so if I click on it I get the
[00:31:30] else and so if I click on it I get the class activation map for this which
[00:31:32] class activation map for this which shows that indeed it is focused on the
[00:31:34] shows that indeed it is focused on the heart to be able to and and is looking
[00:31:39] heart to be able to and and is looking at the right thing so I guess you can
[00:31:41] at the right thing so I guess you can say the algorithms hearts in the right
[00:31:43] say the algorithms hearts in the right place cool but I thought so this is an
[00:31:49] place cool but I thought so this is an image that I got from the the data set
[00:31:53] image that I got from the the data set that we were using NIH but it's pretty
[00:31:56] that we were using NIH but it's pretty cool if an algorithm is able to
[00:31:57] cool if an algorithm is able to generalize to populations beyond and so
[00:32:01] generalize to populations beyond and so I thought what we do is we could just
[00:32:02] I thought what we do is we could just look up look up an image of cardiomegaly
[00:32:08] look up look up an image of cardiomegaly and download it and just see if our
[00:32:14] and download it and just see if our model is able to this one looks pretty
[00:32:19] model is able to this one looks pretty large so does this I don't want an
[00:32:25] large so does this I don't want an annotated one all right that's good so
[00:32:29] annotated one all right that's good so we can do that save it desktop and now
[00:32:39] we can do that save it desktop and now we can upload it here
[00:32:46] and it's already we done a thing and on
[00:32:52] and it's already we done a thing and on the top is cardiomegaly once again so
[00:32:55] the top is cardiomegaly once again so it's able to generalize to and there
[00:32:58] it's able to generalize to and there it's the highlight so it's able to
[00:33:00] it's the highlight so it's able to generalize to populations beyond just
[00:33:02] generalize to populations beyond just the ones it was trained on so I'm very
[00:33:05] the ones it was trained on so I'm very excited by that and what I got even more
[00:33:10] excited by that and what I got even more excited by is we're thinking of
[00:33:13] excited by is we're thinking of deploying this out in out in different
[00:33:16] deploying this out in out in different parts of the world and when we got an
[00:33:19] parts of the world and when we got an image that showed how x-rays are read in
[00:33:24] image that showed how x-rays are read in this hospital that we were working with
[00:33:27] this hospital that we were working with in Africa this is what we saw and so the
[00:33:32] in Africa this is what we saw and so the idea that one could snap a picture and
[00:33:34] idea that one could snap a picture and upload it seems and get a diagnosis
[00:33:37] upload it seems and get a diagnosis seems very powerful so the third case
[00:33:42] seems very powerful so the third case study I want to take you through is
[00:33:44] study I want to take you through is being able to look at M R so we've
[00:33:46] being able to look at M R so we've talked about 1d a 1d setup where we had
[00:33:50] talked about 1d a 1d setup where we had an ECG signal we've talked about a 2d
[00:33:52] an ECG signal we've talked about a 2d setup with an x-ray how many of you
[00:33:55] setup with an x-ray how many of you thinking of working on a 3d problem for
[00:33:59] thinking of working on a 3d problem for your project whew that's good cool so
[00:34:07] your project whew that's good cool so here we looked at niyama so mrs of the
[00:34:10] here we looked at niyama so mrs of the knee is the standard of care to evaluate
[00:34:13] knee is the standard of care to evaluate knee disorders and more mr examinations
[00:34:17] knee disorders and more mr examinations are performed on the knee than any other
[00:34:19] are performed on the knee than any other part of the body and the question that
[00:34:24] part of the body and the question that we sought out to answer was can we
[00:34:27] we sought out to answer was can we identify knee of normalities two of the
[00:34:32] identify knee of normalities two of the most common ones include an ACL tear and
[00:34:34] most common ones include an ACL tear and a meniscal tear at the level of
[00:34:37] a meniscal tear at the level of radiologists now with the 3d problem one
[00:34:42] radiologists now with the 3d problem one thing that we have that we don't have in
[00:34:44] thing that we have that we don't have in a 2d setting is the ability to look to
[00:34:47] a 2d setting is the ability to look to look at the same same thing from
[00:34:49] look at the same same thing from different angles and so when radio I'll
[00:34:52] different angles and so when radio I'll just do this diagnosis they look at
[00:34:54] just do this diagnosis they look at three views the sagittal the coronal and
[00:34:57] three views the sagittal the coronal and the axial which are four to three ways
[00:35:01] the axial which are four to three ways of looking through the 3d structure of
[00:35:05] of looking through the 3d structure of the knee and in an mr you get different
[00:35:08] the knee and in an mr you get different types of series based on the magnetic
[00:35:12] types of series based on the magnetic fields and so there are three different
[00:35:15] fields and so there are three different series that are that are used and what
[00:35:19] series that are that are used and what we're gonna do is output for a
[00:35:23] we're gonna do is output for a particular knee amar examination the
[00:35:26] particular knee amar examination the probability that it's abnormal the
[00:35:29] probability that it's abnormal the probability of an ACL tear and the
[00:35:31] probability of an ACL tear and the probability of a meniscal tear important
[00:35:34] probability of a meniscal tear important thing to recognize here is this is not a
[00:35:36] thing to recognize here is this is not a multi-class problem in that I could have
[00:35:39] multi-class problem in that I could have both types of tears
[00:35:41] both types of tears it's a multi-label problem so we're
[00:35:46] it's a multi-label problem so we're going to train a convolutional neural
[00:35:49] going to train a convolutional neural network for every view pathology pair so
[00:35:54] network for every view pathology pair so that's nine convolutional networks and
[00:35:58] that's nine convolutional networks and then combine them together using a
[00:36:02] then combine them together using a logistic regression so here's what each
[00:36:06] logistic regression so here's what each convolutional neural network looks like
[00:36:08] convolutional neural network looks like I have a bunch of slices within a view
[00:36:11] I have a bunch of slices within a view I'm gonna pass each of them to a feature
[00:36:14] I'm gonna pass each of them to a feature extractor I'm gonna get an output
[00:36:16] extractor I'm gonna get an output probability so we had a thousand four
[00:36:22] probability so we had a thousand four hundred knee mr exams from the Stanford
[00:36:26] hundred knee mr exams from the Stanford Medical Center and we tested on 120 of
[00:36:31] Medical Center and we tested on 120 of them where the majority vote of three
[00:36:35] them where the majority vote of three subspecialty radiologists established
[00:36:38] subspecialty radiologists established the ground truth and we found that we
[00:36:43] the ground truth and we found that we did pretty well on on the three tasks
[00:36:47] did pretty well on on the three tasks and had the model be able to pick up the
[00:36:52] and had the model be able to pick up the different abnormalities pretty well and
[00:36:54] different abnormalities pretty well and one can extend these these methods of
[00:36:56] one can extend these these methods of interpretive interpretability
[00:36:58] interpretive interpretability to to 3d 3d inputs as well so that's
[00:37:03] to to 3d 3d inputs as well so that's what we did here okay so I I saw this I
[00:37:09] what we did here okay so I I saw this I saw this cartoon a few a few weeks ago
[00:37:12] saw this cartoon a few a few weeks ago and I thought it was pretty funny
[00:37:15] and I thought it was pretty funny which is a lot of machine learning
[00:37:17] which is a lot of machine learning engineers think that they don't need to
[00:37:20] engineers think that they don't need to externally validate which is find out
[00:37:22] externally validate which is find out how my model works on works on data
[00:37:26] how my model works on works on data that's not my where my original data set
[00:37:29] that's not my where my original data set came from so there's a there's a
[00:37:31] came from so there's a there's a difference in in distribution but it's
[00:37:34] difference in in distribution but it's really quite exciting when a model does
[00:37:38] really quite exciting when a model does generalize to two data sets that it's
[00:37:41] generalize to two data sets that it's not seen before and so we got this data
[00:37:46] not seen before and so we got this data set that's that's public from a hospital
[00:37:49] set that's that's public from a hospital in Croatia and here's how it was
[00:37:51] in Croatia and here's how it was different so it was a different it was a
[00:37:54] different so it was a different it was a different kind of series two different
[00:37:56] different kind of series two different magnetic properties is a different
[00:37:59] magnetic properties is a different scanner and it was a different
[00:38:01] scanner and it was a different institution in a different country and
[00:38:02] institution in a different country and we asked okay what happens when we run
[00:38:05] we asked okay what happens when we run this model off the shelf that was
[00:38:07] this model off the shelf that was trained on Stanford data but tested on
[00:38:10] trained on Stanford data but tested on that kind of data and we found that it
[00:38:12] that kind of data and we found that it did relatively well without any training
[00:38:16] did relatively well without any training at all but then when we trained on it we
[00:38:20] at all but then when we trained on it we found that we were able to outperform
[00:38:22] found that we were able to outperform the previous lis best reported result on
[00:38:26] the previous lis best reported result on the data set so there's still some work
[00:38:29] the data set so there's still some work to be done in being able to generalize
[00:38:33] to be done in being able to generalize sort of my network here that was trained
[00:38:36] sort of my network here that was trained on my data to be able to work on
[00:38:39] on my data to be able to work on datasets from different institutions
[00:38:40] datasets from different institutions different countries as well but we're
[00:38:43] different countries as well but we're making some steps along that way remains
[00:38:45] making some steps along that way remains a very open problem for taking
[00:38:53] yeah so we did the best we could in
[00:38:56] yeah so we did the best we could in terms of processing so we had so one of
[00:39:00] terms of processing so we had so one of the pre-processing steps that's
[00:39:01] the pre-processing steps that's important is being able to get the mean
[00:39:05] important is being able to get the mean of the of the input data to be as close
[00:39:09] of the of the input data to be as close to the mean of the input data that you
[00:39:12] to the mean of the input data that you train up
[00:39:13] train up so that was one pre-processing step we
[00:39:15] so that was one pre-processing step we tried when we were trying to minimize
[00:39:17] tried when we were trying to minimize that to say out of the box how would
[00:39:20] that to say out of the box how would this work if we had never seen this data
[00:39:21] this work if we had never seen this data before how would it work on that
[00:39:23] before how would it work on that population so one big topic in across a
[00:39:32] population so one big topic in across a lot of applied fields is asking question
[00:39:35] lot of applied fields is asking question okay we're talking about models working
[00:39:39] okay we're talking about models working automatically autonomously how would
[00:39:42] automatically autonomously how would these models work in when working
[00:39:47] these models work in when working together with experts in different
[00:39:50] together with experts in different fields and here we ask that questions
[00:39:52] fields and here we ask that questions about radiologists and about imaging
[00:39:55] about radiologists and about imaging models would it be possible to be able
[00:39:58] models would it be possible to be able to boost the performance if the model
[00:40:02] to boost the performance if the model and the radiologists work together and
[00:40:06] and the radiologists work together and so that's really the set up a
[00:40:08] so that's really the set up a radiologists wit model is that better
[00:40:11] radiologists wit model is that better than the radiologists by themselves and
[00:40:15] than the radiologists by themselves and here's how we set it up we said let's
[00:40:17] here's how we set it up we said let's have experts read the same case twice
[00:40:21] have experts read the same case twice separated by a certain set of weeks and
[00:40:27] separated by a certain set of weeks and then see how they would perform on the
[00:40:30] then see how they would perform on the same set of cases and what we found that
[00:40:35] same set of cases and what we found that we were able to increase the performance
[00:40:38] we were able to increase the performance generally with a significant significant
[00:40:41] generally with a significant significant increase in specificity for ACL tears
[00:40:45] increase in specificity for ACL tears that means if someone if a patient came
[00:40:48] that means if someone if a patient came in without a without an ACL tear I'd be
[00:40:55] in without a without an ACL tear I'd be able to find it better so in the future
[00:41:01] able to find it better so in the future yes question
[00:41:03] yes question the opinion of the radiologists art is
[00:41:06] the opinion of the radiologists art is that that intended thing that one kind
[00:41:08] that that intended thing that one kind of bias anything import actually looks
[00:41:11] of bias anything import actually looks at the Commission's out yeah so that's a
[00:41:13] at the Commission's out yeah so that's a good question and I and I think how so
[00:41:16] good question and I and I think how so the sort of automation bias captures a
[00:41:20] the sort of automation bias captures a lot of this and that once we have sort
[00:41:23] lot of this and that once we have sort of models working with experts together
[00:41:27] of models working with experts together can we expect that the experts will sort
[00:41:31] can we expect that the experts will sort of take it less seriously that's that's
[00:41:34] of take it less seriously that's that's a big concern and start relying on what
[00:41:36] a big concern and start relying on what the model says and says I won't even
[00:41:38] the model says and says I won't even look at this exam I'm just going to
[00:41:40] look at this exam I'm just going to trust what the model says blindly that's
[00:41:44] trust what the model says blindly that's absolutely possible in a very open area
[00:41:46] absolutely possible in a very open area of research some of the ways that people
[00:41:48] of research some of the ways that people have tried to address it is to say you
[00:41:51] have tried to address it is to say you know what I'm gonna do from time to time
[00:41:53] know what I'm gonna do from time to time I'm going to pass in an exam to the
[00:41:55] I'm going to pass in an exam to the radiologist for which I'm going to flip
[00:41:58] radiologist for which I'm going to flip the answer and I'll know the right one
[00:42:01] the answer and I'll know the right one and if they get that wrong I'll alert
[00:42:03] and if they get that wrong I'll alert them that you're relying too much on the
[00:42:06] them that you're relying too much on the model stop but there are a lot of more
[00:42:09] model stop but there are a lot of more sophisticated ways to go about
[00:42:11] sophisticated ways to go about addressing automated bias and as far as
[00:42:13] addressing automated bias and as far as I know it's a very open field of
[00:42:16] I know it's a very open field of research especially as we're getting
[00:42:17] research especially as we're getting into deep learning assistants and one
[00:42:23] into deep learning assistants and one utility of this is to say basically that
[00:42:26] utility of this is to say basically that the set of patients don't need follow-up
[00:42:28] the set of patients don't need follow-up let's not send them for unnecessary
[00:42:31] let's not send them for unnecessary surgery great so I shared three case
[00:42:35] surgery great so I shared three case studies from lab the final thing I want
[00:42:37] studies from lab the final thing I want to do is to talk a little bit about how
[00:42:40] to do is to talk a little bit about how you can get involved if you're
[00:42:42] you can get involved if you're interested in applications of AI to
[00:42:45] interested in applications of AI to healthcare so the first is the ability
[00:42:52] healthcare so the first is the ability for you to just get your hands dirty
[00:42:54] for you to just get your hands dirty with datasets and and be able to try out
[00:42:58] with datasets and and be able to try out your own model so we have from our lab
[00:43:02] your own model so we have from our lab released the Morra dataset which is a
[00:43:05] released the Morra dataset which is a large data set of bone x-rays and the
[00:43:09] large data set of bone x-rays and the task is to be able to tell if it's if
[00:43:12] task is to be able to tell if it's if the x-rays are normal or not and they
[00:43:15] the x-rays are normal or not and they come from different
[00:43:17] come from different parts of the of the upper body and
[00:43:21] parts of the of the upper body and that's that's what the dataset x-rays
[00:43:24] that's that's what the dataset x-rays look like and this is a pretty
[00:43:26] look like and this is a pretty interesting setup because you have more
[00:43:28] interesting setup because you have more than one view more than one angle for
[00:43:32] than one view more than one angle for the same body part for the same study
[00:43:34] the same body part for the same study for the same patient and the goal is to
[00:43:36] for the same patient and the goal is to be able to combine this well into
[00:43:38] be able to combine this well into convolutional neural network and and be
[00:43:41] convolutional neural network and and be able to output the probability of
[00:43:42] able to output the probability of abnormality and one of the interesting
[00:43:45] abnormality and one of the interesting things here for transfer learning as
[00:43:47] things here for transfer learning as well is do you want to train the models
[00:43:50] well is do you want to train the models differently per body part or do you want
[00:43:52] differently per body part or do you want to train them train the same model for
[00:43:55] to train them train the same model for body parts or combine certain models
[00:43:58] body parts or combine certain models it's a lot of design decisions there and
[00:44:01] it's a lot of design decisions there and this is what train some train models
[00:44:03] this is what train some train models look like this is a model baseline that
[00:44:06] look like this is a model baseline that we released that's able to identify a
[00:44:08] we released that's able to identify a fracture here and a piece of hardware on
[00:44:11] fracture here and a piece of hardware on the right and you can download the data
[00:44:17] the right and you can download the data set of our website so if you google
[00:44:20] set of our website so if you google Maura data set or go on our website
[00:44:23] Maura data set or go on our website Stanford ml group github I oh you should
[00:44:26] Stanford ml group github I oh you should be able to find it the second way to get
[00:44:30] be able to find it the second way to get involved is through the AF for
[00:44:32] involved is through the AF for healthcare bootcamp which is a two
[00:44:35] healthcare bootcamp which is a two quarter long program that our lab runs
[00:44:38] quarter long program that our lab runs which provides students coming out of
[00:44:41] which provides students coming out of classes like 2:30 an opportunity to get
[00:44:44] classes like 2:30 an opportunity to get involved in research and here's students
[00:44:49] involved in research and here's students receive training from PhD students in
[00:44:52] receive training from PhD students in the lab and medical school faculty to
[00:44:56] the lab and medical school faculty to work on structured projects over two
[00:44:58] work on structured projects over two quarters and if you have a background in
[00:45:00] quarters and if you have a background in sir AI which you do then you're
[00:45:04] sir AI which you do then you're encouraged to apply and we're working on
[00:45:06] encouraged to apply and we're working on a wide set of problems across radiology
[00:45:10] a wide set of problems across radiology EHR Public Health and pathology right
[00:45:12] EHR Public Health and pathology right now this is what the lab looks like we
[00:45:18] now this is what the lab looks like we have a lot of fun
[00:45:22] and the applications for the bootcamp
[00:45:25] and the applications for the bootcamp starting in the winter are now open so
[00:45:27] starting in the winter are now open so the early applications deadline is
[00:45:29] the early applications deadline is remember 23rd and you can go on this
[00:45:32] remember 23rd and you can go on this link and and and apply so that's my time
[00:45:38] link and and and apply so that's my time thank you so much for having me and
[00:45:40] thank you so much for having me and thanks for having me can yes I'll take a
[00:45:55] thanks for having me can yes I'll take a couple questions you asked a question
[00:45:58] couple questions you asked a question about privacy concerns in terms of other
[00:46:01] about privacy concerns in terms of other ethics concerns
[00:46:03] ethics concerns what about compensation for the medical
[00:46:05] what about compensation for the medical experts figures potentially putting out
[00:46:07] experts figures potentially putting out of business with a rule like the one day
[00:46:10] of business with a rule like the one day you're you're developing or you know and
[00:46:12] you're you're developing or you know and just in general because their their
[00:46:15] just in general because their their knowledge is being used to train these
[00:46:17] knowledge is being used to train these models it's not free
[00:46:19] models it's not free yeah so the question was we're having
[00:46:23] yeah so the question was we're having these automated AI models trained with
[00:46:27] these automated AI models trained with the knowledge of medical experts what
[00:46:31] the knowledge of medical experts what are ways in which we're thinking of
[00:46:33] are ways in which we're thinking of compensating these medical experts right
[00:46:36] compensating these medical experts right now or in the future when we have
[00:46:38] now or in the future when we have possibly automated models I think a lot
[00:46:43] possibly automated models I think a lot of people are thinking about these
[00:46:44] of people are thinking about these problems and working on them right now
[00:46:47] problems and working on them right now there are a variety of approaches that
[00:46:52] there are a variety of approaches that people are thinking about in terms of
[00:46:53] people are thinking about in terms of economic incentives and there's a lot of
[00:46:56] economic incentives and there's a lot of fear about sort of well AI actually work
[00:47:01] fear about sort of well AI actually work with or augment experts in whatever
[00:47:04] with or augment experts in whatever field they're working on I don't have a
[00:47:06] field they're working on I don't have a great silver bullet for this but I know
[00:47:10] great silver bullet for this but I know there's there's a lot of work going on
[00:47:11] there's there's a lot of work going on in there when you're eating through MRIs
[00:47:18] in there when you're eating through MRIs we show looking at four or five category
[00:47:21] we show looking at four or five category of issues like one of them is most
[00:47:24] of issues like one of them is most likely it's possible that a human
[00:47:27] likely it's possible that a human looking at it could point out something
[00:47:29] looking at it could point out something that was not being looked at by the AI
[00:47:32] that was not being looked at by the AI model at that time yeah so how do you
[00:47:35] model at that time yeah so how do you yeah that's a great question so that
[00:47:38] yeah that's a great question so that just to repeat the question
[00:47:39] just to repeat the question it's we have we're looking at M our
[00:47:42] it's we have we're looking at M our exams and we're
[00:47:43] exams and we're saying from these three pathologies were
[00:47:45] saying from these three pathologies were able to output the probabilities what
[00:47:48] able to output the probabilities what happens if there's another pathology
[00:47:50] happens if there's another pathology that we haven't looked at so I have a
[00:47:54] that we haven't looked at so I have a couple of answers for that the first is
[00:47:56] couple of answers for that the first is that one of the one of the categories
[00:47:58] that one of the one of the categories here was simply to tell whether it was
[00:48:01] here was simply to tell whether it was normal or abnormal so the idea here is
[00:48:04] normal or abnormal so the idea here is that the abnormality class will capture
[00:48:06] that the abnormality class will capture a lot of different pathologies there at
[00:48:09] a lot of different pathologies there at least the ones seen at Stanford but it's
[00:48:12] least the ones seen at Stanford but it's often the case that we're building for
[00:48:14] often the case that we're building for one particular pathology and then
[00:48:17] one particular pathology and then there's obviously a a burden on the the
[00:48:21] there's obviously a a burden on the the model and the model developers to be
[00:48:23] model and the model developers to be able to convey hey look our algorithm
[00:48:25] able to convey hey look our algorithm model only does this and you really need
[00:48:29] model only does this and you really need to watch out for everything else that
[00:48:30] to watch out for everything else that the model doesn't cover
[00:48:33] the model doesn't cover maybe that's the unless there's one more
[00:48:37] maybe that's the unless there's one more question no all right that's the last
[00:48:40] question no all right that's the last question we'll take then thank you once
[00:48:41] question we'll take then thank you once again so now you've got you've got the
[00:48:48] again so now you've got you've got the the perspective is the microphone
[00:48:50] the perspective is the microphone working yeah now you've got the
[00:48:52] working yeah now you've got the perspective of may I researcher working
[00:48:55] perspective of may I researcher working in healthcare now you are going to be
[00:48:57] in healthcare now you are going to be the AI research researcher working in
[00:48:59] the AI research researcher working in healthcare we're going to go over a case
[00:49:01] healthcare we're going to go over a case study that is targeted at skin disease
[00:49:04] study that is targeted at skin disease so you know in order to detect skin
[00:49:07] so you know in order to detect skin disease sometimes you take pictures
[00:49:10] disease sometimes you take pictures microscopic pictures of cells on your
[00:49:12] microscopic pictures of cells on your skin and then analyze those pictures so
[00:49:14] skin and then analyze those pictures so that's what we're going to talk about
[00:49:15] that's what we're going to talk about today so let me talk about the problem
[00:49:18] today so let me talk about the problem statement your deep learning engineer
[00:49:22] statement your deep learning engineer and you've been chosen by a group of
[00:49:24] and you've been chosen by a group of healthcare practitioners to determine
[00:49:27] healthcare practitioners to determine which parts of a microscopic image
[00:49:30] which parts of a microscopic image corresponds to a cell okay so here's how
[00:49:33] corresponds to a cell okay so here's how it looks like on the the black and white
[00:49:37] it looks like on the the black and white it's not a black and white image it's a
[00:49:39] it's not a black and white image it's a color image which looks black and white
[00:49:40] color image which looks black and white the input image is the one that is
[00:49:42] the input image is the one that is closer to me and the yellow one is the
[00:49:47] closer to me and the yellow one is the ground truth that has been labeled by a
[00:49:50] ground truth that has been labeled by a doctor let's say so what you're trying
[00:49:52] doctor let's say so what you're trying to do is to segment those cells on this
[00:49:56] to do is to segment those cells on this image and we
[00:49:57] image and we talk about segmentation yet or a little
[00:49:59] talk about segmentation yet or a little bit segmentation is is about producing
[00:50:04] bit segmentation is is about producing value a class for each of the pixels on
[00:50:07] value a class for each of the pixels on our image so in this case each pixel
[00:50:09] our image so in this case each pixel would correspond to either no cell or
[00:50:12] would correspond to either no cell or cell zero or one and once we output a
[00:50:17] cell zero or one and once we output a matrix of zeros and ones telling us
[00:50:19] matrix of zeros and ones telling us which pixels correspond it to a cell we
[00:50:23] which pixels correspond it to a cell we should get hopefully a mask like the
[00:50:24] should get hopefully a mask like the yellow mask that I overlapped with the
[00:50:28] yellow mask that I overlapped with the input image does that make sense yeah
[00:50:34] color image the other one you don't have
[00:50:37] color image the other one you don't have the boundaries for yourself yeah we'll
[00:50:38] the boundaries for yourself yeah we'll talk about the boundary later but right
[00:50:40] talk about the boundary later but right now assume it's a binary segmentation so
[00:50:43] now assume it's a binary segmentation so 0 &amp; 1
[00:50:43] 0 &amp; 1 no cell and cell okay so it's going to
[00:50:49] no cell and cell okay so it's going to be very interactive and I think we're
[00:50:52] be very interactive and I think we're going to use Monte for several question
[00:50:54] going to use Monte for several question and group you guys into groups of three
[00:50:57] and group you guys into groups of three so here are other examples of images
[00:50:59] so here are other examples of images that were segmented with a mask now
[00:51:03] that were segmented with a mask now doctors have collected 100,000 images
[00:51:07] doctors have collected 100,000 images coming from microscopes but the images
[00:51:10] coming from microscopes but the images come from three different microscopes
[00:51:12] come from three different microscopes there is a type a Type B n type C
[00:51:14] there is a type a Type B n type C microscope and the data is split in
[00:51:17] microscope and the data is split in between these three as 50% for type a
[00:51:19] between these three as 50% for type a 25% for type B 25% for type C the first
[00:51:24] 25% for type B 25% for type C the first question I'll have for you is given that
[00:51:28] question I'll have for you is given that the doctors want to be able to use your
[00:51:30] the doctors want to be able to use your algorithm on images from the microscope
[00:51:33] algorithm on images from the microscope of type C this microscope is the latest
[00:51:36] of type C this microscope is the latest one it's the one that is going to be
[00:51:37] one it's the one that is going to be used widely in the field and they want
[00:51:40] used widely in the field and they want your your network to work on this one
[00:51:41] your your network to work on this one how would you split your data set into
[00:51:44] how would you split your data set into trained dev and test set as a question
[00:51:46] trained dev and test set as a question and please group in teams of two or
[00:51:48] and please group in teams of two or three and discuss it for a minute on how
[00:51:52] three and discuss it for a minute on how you would split this data set
[00:53:02] you can start going on men's Ian and
[00:53:05] you can start going on men's Ian and write down your answers as well okay so
[00:53:29] write down your answers as well okay so take a 30 seconds to input your your
[00:53:33] take a 30 seconds to input your your insights on on mentee you can do one per
[00:53:35] insights on on mentee you can do one per team
[00:53:37] and we'll start going over some of the
[00:53:41] and we'll start going over some of the answers here
[00:53:43] answers here okay the f-test sleaze split see train
[00:53:48] okay the f-test sleaze split see train on a plus B 20k in train 2.5 in Devon
[00:53:54] on a plus B 20k in train 2.5 in Devon test training 80 on a all the 5k see
[00:54:01] test training 80 on a all the 5k see deaf 10k see test 10k see 95 5 where
[00:54:06] deaf 10k see test 10k see 95 5 where tests and them is from population we
[00:54:08] tests and them is from population we care about I think these are good
[00:54:09] care about I think these are good answers I think there is no perfect
[00:54:11] answers I think there is no perfect answer to that but two things to take
[00:54:13] answer to that but two things to take into consideration you have a lot of
[00:54:15] into consideration you have a lot of data so you probably want to split it
[00:54:18] data so you probably want to split it into 95 5 closer to that than 260 2020
[00:54:22] into 95 5 closer to that than 260 2020 and most importantly you want to have
[00:54:25] and most importantly you want to have see images in the test dev and test set
[00:54:28] see images in the test dev and test set to have the same distribution among
[00:54:30] to have the same distribution among these two that's what you've seen in the
[00:54:31] these two that's what you've seen in the third course and we would prefer to have
[00:54:35] third course and we would prefer to have actually see images in the train set
[00:54:37] actually see images in the train set you want your algorithm to have seen see
[00:54:39] you want your algorithm to have seen see images so I would say you're very good
[00:54:40] images so I would say you're very good answer is is this one 1955 where the
[00:54:44] answer is is this one 1955 where the five and five are exclusively from
[00:54:46] five and five are exclusively from see and you also have see images in the
[00:54:48] see and you also have see images in the 90% of train images any other insights
[00:54:52] 90% of train images any other insights on that what your grease yeah how do we
[00:54:57] on that what your grease yeah how do we attack that like microscopes Nvidia
[00:54:59] attack that like microscopes Nvidia doesn't have my hidden you know features
[00:55:03] doesn't have my hidden you know features that mess up any yeah so there is much
[00:55:06] that mess up any yeah so there is much more thing we didn't talk about here one
[00:55:07] more thing we didn't talk about here one is how do we know what's the
[00:55:09] is how do we know what's the distribution of microscope a images and
[00:55:12] distribution of microscope a images and microscope images versus microscope see
[00:55:13] microscope images versus microscope see do they look like each other if they do
[00:55:15] do they look like each other if they do all good if they don't how can we how
[00:55:19] all good if they don't how can we how can we make sure the model doesn't get
[00:55:22] can we make sure the model doesn't get bad hints from these two distributions
[00:55:25] bad hints from these two distributions another thing is data date augmentation
[00:55:27] another thing is data date augmentation we could augment this dataset as well
[00:55:29] we could augment this dataset as well and try to get as much as C distribution
[00:55:32] and try to get as much as C distribution images as possible we're going to talk
[00:55:34] images as possible we're going to talk about that okay speed has to roughly be
[00:55:40] about that okay speed has to roughly be 95 5 not 60 20/20 distribution of dev
[00:55:43] 95 5 not 60 20/20 distribution of dev and test sets has to be the same contain
[00:55:45] and test sets has to be the same contain images from CN there also there should
[00:55:46] images from CN there also there should also be see images in the training set
[00:55:48] also be see images in the training set now talking about date augmentation do
[00:55:52] now talking about date augmentation do you think you can augment this data and
[00:55:54] you think you can augment this data and if yes give only 3ds things method you
[00:55:59] if yes give only 3ds things method you would use if no xsplit explicate explain
[00:56:02] would use if no xsplit explicate explain why you cannot you want to take 30
[00:56:06] why you cannot you want to take 30 seconds to talk about it with your
[00:56:08] seconds to talk about it with your neighbors
[00:57:23] okay okay guys let's go over some of the
[00:57:38] okay okay guys let's go over some of the answers so rotation zoom blur I think
[00:57:42] answers so rotation zoom blur I think looking at the images that we have from
[00:57:44] looking at the images that we have from the cells this might work very well
[00:57:47] the cells this might work very well rotation zoom blur translation
[00:57:51] rotation zoom blur translation combination of those stretch symmetry
[00:57:55] combination of those stretch symmetry like probably a lot of those work a one
[00:57:58] like probably a lot of those work a one follow-up question that I'll have is can
[00:58:00] follow-up question that I'll have is can you can someone give an example of a
[00:58:03] you can someone give an example of a task where the augmentation might hurt
[00:58:06] task where the augmentation might hurt the model rather than helping it
[00:58:15] if you want to overfeed on the test set
[00:58:18] if you want to overfeed on the test set can you be more precise like then you
[00:58:22] can you be more precise like then you don't want to generalize oh you don't
[00:58:24] don't want to generalize oh you don't want your model to generalize too much
[00:58:26] want your model to generalize too much okay yeah that did there's some cases
[00:58:30] okay yeah that did there's some cases where you don't want them what model to
[00:58:32] where you don't want them what model to generalize too much especially no to an
[00:58:33] generalize too much especially no to an encoder but any any other ideas you're
[00:58:36] encoder but any any other ideas you're doing like face detection move on the
[00:58:38] doing like face detection move on the face be like upside down or like either
[00:58:40] face be like upside down or like either side I see so if you do face detection
[00:58:43] side I see so if you do face detection you probably don't want the face to be
[00:58:45] you probably don't want the face to be upside down although we never know
[00:58:46] upside down although we never know depending on the use but it's it's not
[00:58:52] depending on the use but it's it's not gonna help much if the camera is always
[00:58:54] gonna help much if the camera is always like that and it's filming humans that
[00:58:56] like that and it's filming humans that are not upside now and yeah but I don't
[00:58:59] are not upside now and yeah but I don't think it's gonna hurt the model it's
[00:59:00] think it's gonna hurt the model it's probably going to not help the model I
[00:59:02] probably going to not help the model I guess yeah if you really stretch the
[00:59:09] guess yeah if you really stretch the image so there's our algorithms like
[00:59:14] image so there's our algorithms like maybe you know flow net it's an
[00:59:15] maybe you know flow net it's an algorithm that that's used for long
[00:59:18] algorithm that that's used for long videos to detect the speed of the car
[00:59:20] videos to detect the speed of the car let's say if you stretch the images
[00:59:22] let's say if you stretch the images probably you cannot detect the speed of
[00:59:24] probably you cannot detect the speed of the car anymore any other examples
[00:59:28] the car anymore any other examples [Music]
[00:59:34] character recognition I think it's a
[00:59:37] character recognition I think it's a good example so let's say you you're
[00:59:39] good example so let's say you you're trying to detect pcs and you do
[00:59:41] trying to detect pcs and you do symmetric flip and you get that you know
[00:59:45] symmetric flip and you get that you know like your your labeling is be everything
[00:59:47] like your your labeling is be everything that was D and as d everything that was
[00:59:49] that was D and as d everything that was B for nine and six it's the same story
[00:59:51] B for nine and six it's the same story so these data augmentations are actually
[00:59:54] so these data augmentations are actually hurting the model because you don't real
[00:59:56] hurting the model because you don't real able when you data when you match your
[00:59:57] able when you data when you match your data right okay okay so yeah many
[01:00:05] data right okay okay so yeah many augmentation methods are possible
[01:00:07] augmentation methods are possible cropping adding random noise changing
[01:00:10] cropping adding random noise changing contrast I think the atomic chain is
[01:00:12] contrast I think the atomic chain is super important I remember a story of of
[01:00:16] super important I remember a story of of a company that was working on a
[01:00:18] a company that was working on a self-driving cars and and also virtual
[01:00:21] self-driving cars and and also virtual assistants in cars you know like this
[01:00:24] assistants in cars you know like this type of interaction you have with
[01:00:25] type of interaction you have with someone in your car a virtual assistant
[01:00:27] someone in your car a virtual assistant and they noticed that the speech
[01:00:29] and they noticed that the speech recognition system was actually not
[01:00:32] recognition system was actually not working well when the car was going
[01:00:34] working well when the car was going backwards like no idea why like why is
[01:00:39] backwards like no idea why like why is this doesn't seem related to the speech
[01:00:40] this doesn't seem related to the speech recognition system of the car and they
[01:00:43] recognition system of the car and they test it out and they looked and they
[01:00:46] test it out and they looked and they figured out that people were putting
[01:00:48] figured out that people were putting their hands in the passenger seat
[01:00:50] their hands in the passenger seat looking back and talking to the virtual
[01:00:51] looking back and talking to the virtual assistant and because the microphone was
[01:00:53] assistant and because the microphone was in the front the voice was very
[01:00:55] in the front the voice was very different when you were talking to to
[01:00:56] different when you were talking to to the back of the car rather than the
[01:00:58] the back of the car rather than the front of the car and so they used date
[01:01:00] front of the car and so they used date augmentation in order to augment their
[01:01:02] augmentation in order to augment their current data they didn't have de town
[01:01:04] current data they didn't have de town that type of of people talking to the
[01:01:07] that type of of people talking to the back of the car so by augmenting smartly
[01:01:09] back of the car so by augmenting smartly you can change the voices so that they
[01:01:10] you can change the voices so that they look like they were used by someone who
[01:01:13] look like they were used by someone who was talking to the back of the car and
[01:01:14] was talking to the back of the car and then solve the problem okay
[01:01:19] then solve the problem okay small question we can do it quickly what
[01:01:22] small question we can do it quickly what is the mathematical relation between NX
[01:01:24] is the mathematical relation between NX and NY so remember we have an RGB image
[01:01:27] and NY so remember we have an RGB image and we can we can flatten it into a
[01:01:31] and we can we can flatten it into a vector of size n X and the output is a
[01:01:33] vector of size n X and the output is a mask of size and Y what's the
[01:01:35] mask of size and Y what's the relationship between NX and NY someone
[01:01:38] relationship between NX and NY someone wants to go for it
[01:01:45] yeah three boys their equality
[01:01:49] yeah three boys their equality who thinks they record good things
[01:01:54] who thinks they record good things they're not equal and why because you
[01:01:58] they're not equal and why because you have RGB on this side and why would be 3
[01:02:04] have RGB on this side and why would be 3 and X I saw n X will be 3 and Y because
[01:02:08] and X I saw n X will be 3 and Y because you have RGB images and for each RGB
[01:02:11] you have RGB images and for each RGB pixel you would have 1 output 0 or 1 ok
[01:02:15] pixel you would have 1 output 0 or 1 ok that was a question on one of the
[01:02:16] that was a question on one of the midterms was a complicated question
[01:02:19] midterms was a complicated question what's the last activation of your
[01:02:21] what's the last activation of your network sigmoid you want probably an
[01:02:26] network sigmoid you want probably an output 0 &amp; 1 and if you had several
[01:02:29] output 0 &amp; 1 and if you had several classes so later on we will see we can
[01:02:31] classes so later on we will see we can also segment per DC's then you would
[01:02:33] also segment per DC's then you would have the softmax what loss function
[01:02:36] have the softmax what loss function should we use I'm going to give it to
[01:02:40] should we use I'm going to give it to you to go quickly because we don't have
[01:02:41] you to go quickly because we don't have too much time you're going to use a
[01:02:44] too much time you're going to use a binary cross-entropy loss over all the
[01:02:48] binary cross-entropy loss over all the outputs the entries of of the outputs of
[01:02:53] outputs the entries of of the outputs of your network doesn't make sense
[01:02:55] your network doesn't make sense so always think the thinking through the
[01:02:58] so always think the thinking through the last function is interesting ok so you
[01:03:03] last function is interesting ok so you have a first try and you've coded your
[01:03:05] have a first try and you've coded your own neural network that you've yelled f2
[01:03:08] own neural network that you've yelled f2 you've named model n1 m1 and you've
[01:03:11] you've named model n1 m1 and you've trained it for a thousand eight bucks it
[01:03:13] trained it for a thousand eight bucks it doesn't end up performing well so it
[01:03:15] doesn't end up performing well so it looks like that you give it the input
[01:03:16] looks like that you give it the input image through the model and get an
[01:03:19] image through the model and get an output that is expected to be the
[01:03:21] output that is expected to be the following one but it's not so one of
[01:03:24] following one but it's not so one of your friends tells you about transfer
[01:03:25] your friends tells you about transfer learning and they they tell you about
[01:03:28] learning and they they tell you about another labelled data set of 1 million
[01:03:31] another labelled data set of 1 million microscope images that have been labeled
[01:03:34] microscope images that have been labeled for skin disease classification which
[01:03:38] for skin disease classification which are very similar to those you want to
[01:03:39] are very similar to those you want to work with from microscope C so a model
[01:03:44] work with from microscope C so a model m2 has already been trained by another
[01:03:46] m2 has already been trained by another research lab on this new data set on a
[01:03:49] research lab on this new data set on a 10 class disease classification and so
[01:03:52] 10 class disease classification and so here is an example of input output of
[01:03:53] here is an example of input output of the model you have an input image that
[01:03:56] the model you have an input image that probably looks very similar to the ones
[01:03:58] probably looks very similar to the ones you're working on the network has a
[01:04:01] you're working on the network has a certain number of layers and
[01:04:03] certain number of layers and softmax classification at the end that
[01:04:05] softmax classification at the end that gives you the probability distribution
[01:04:05] gives you the probability distribution over the disease that seems to
[01:04:07] over the disease that seems to correspond to this image so they're not
[01:04:09] correspond to this image so they're not doing segmentation anymore right
[01:04:11] doing segmentation anymore right they're doing classification okay so the
[01:04:15] they're doing classification okay so the question here is going to be you want to
[01:04:18] question here is going to be you want to perform transfer learning from M 2 to M
[01:04:20] perform transfer learning from M 2 to M 1 what are the hyper parameters that you
[01:04:23] 1 what are the hyper parameters that you will have to tune it's more difficult
[01:04:29] will have to tune it's more difficult than it looks like so think about it
[01:04:31] than it looks like so think about it discuss with your neighbors for a minute
[01:04:32] discuss with your neighbors for a minute try to figure out what are the hyper
[01:04:34] try to figure out what are the hyper parameters involved in this transfer
[01:04:36] parameters involved in this transfer learning process
[01:05:59] okay take 15 more seconds to wrap it up
[01:06:11] okay let's see what you guys have
[01:06:16] okay let's see what you guys have learning rate it is a hyper parameter I
[01:06:19] learning rate it is a hyper parameter I know if it's specific to the transfer
[01:06:22] know if it's specific to the transfer learning weights of the last layers so I
[01:06:25] learning weights of the last layers so I don't think that's a high parameter
[01:06:27] don't think that's a high parameter weights are parameters new cost function
[01:06:32] weights are parameters new cost function for additional output layers I think
[01:06:34] for additional output layers I think that's the hyper that the choice of the
[01:06:36] that's the hyper that the choice of the loss you might count it as a high
[01:06:38] loss you might count it as a high parameter I don't think it's
[01:06:39] parameter I don't think it's specifically related to transfer
[01:06:40] specifically related to transfer learning you will have to train with the
[01:06:42] learning you will have to train with the loss you've used on your model m1 number
[01:06:46] loss you've used on your model m1 number of new layers yeah
[01:06:48] of new layers yeah weights of the new not a hyper parameter
[01:06:55] okay last one or Twitter layers of M -
[01:06:58] okay last one or Twitter layers of M - so do we train what do we fine-tune it's
[01:07:02] so do we train what do we fine-tune it's a lot about layers actually size of
[01:07:04] a lot about layers actually size of added layers not sure okay let's go over
[01:07:11] added layers not sure okay let's go over it together because it seems that
[01:07:12] it together because it seems that there's a lot of different answers here
[01:07:16] there's a lot of different answers here you try to write it down here so let's
[01:07:18] you try to write it down here so let's say we have we have the model M - is it
[01:07:25] say we have we have the model M - is it big enough for the back we have the
[01:07:27] big enough for the back we have the model M - and so we give it an input
[01:07:28] model M - and so we give it an input image okay input and the model M 2 gives
[01:07:34] image okay input and the model M 2 gives us a probability distribution softmax so
[01:07:38] us a probability distribution softmax so we have a soft max here you will agree
[01:07:42] we have a soft max here you will agree that we probably don't need the softmax
[01:07:43] that we probably don't need the softmax layer we don't want it we want to do
[01:07:45] layer we don't want it we want to do such segmentation so one thing we have
[01:07:47] such segmentation so one thing we have to choose is how much of this
[01:07:49] to choose is how much of this pre-trained network because it's a
[01:07:51] pre-trained network because it's a pre-trained network how much of this
[01:07:53] pre-trained network how much of this network do we keep let's say we keep
[01:07:56] network do we keep let's say we keep these layers because they probably know
[01:08:00] these layers because they probably know the inherent salient features of the
[01:08:02] the inherent salient features of the data set like the edges of the cells
[01:08:04] data set like the edges of the cells that were very interested in so we take
[01:08:06] that were very interested in so we take it so we have it here and you agree that
[01:08:11] it so we have it here and you agree that here we have a first high parameter that
[01:08:13] here we have a first high parameter that is L the number of layers from m2 that
[01:08:16] is L the number of layers from m2 that we take now what other hyperparameters
[01:08:21] we take now what other hyperparameters do we have to choose this is L we
[01:08:27] do we have to choose this is L we probably have to add a certain number of
[01:08:29] probably have to add a certain number of layers here in order to produce our
[01:08:31] layers here in order to produce our segmentation so there's probably another
[01:08:33] segmentation so there's probably another hyper parameter which is l0 how many
[01:08:40] hyper parameter which is l0 how many layers do I stack on top of this one and
[01:08:42] layers do I stack on top of this one and remember these layers are pre-trained
[01:08:45] remember these layers are pre-trained but these ones are randomly initialized
[01:08:55] that make sense so too hyper parameters
[01:08:58] that make sense so too hyper parameters anyone see the third one
[01:09:06] the third one comes when you decide to
[01:09:08] the third one comes when you decide to train this new network you have the
[01:09:11] train this new network you have the input image give it to the network get
[01:09:16] input image give it to the network get the output segmentation mask
[01:09:18] the output segmentation mask segmentation mask let's say sag mask and
[01:09:22] segmentation mask let's say sag mask and what you have to decide is how many of
[01:09:25] what you have to decide is how many of these layers will I freeze how many of
[01:09:27] these layers will I freeze how many of the pre train layers I freeze probably
[01:09:30] the pre train layers I freeze probably if you have a small data set you'd
[01:09:32] if you have a small data set you'd prefer keeping the features that are
[01:09:34] prefer keeping the features that are here freezing them and focusing on
[01:09:36] here freezing them and focusing on retraining the last few layers so there
[01:09:39] retraining the last few layers so there is another high parameter which is how
[01:09:41] is another high parameter which is how much of this will I freeze LF what does
[01:09:44] much of this will I freeze LF what does it mean to freeze it means during
[01:09:46] it mean to freeze it means during training I don't train these layers
[01:09:47] training I don't train these layers I assume that they've been seeing a lot
[01:09:50] I assume that they've been seeing a lot of data already they understand very
[01:09:52] of data already they understand very well the edges and less complex features
[01:09:55] well the edges and less complex features of the data I'm going to use my new my
[01:09:58] of the data I'm going to use my new my small data set to Train the last layers
[01:10:00] small data set to Train the last layers so three hyper parameters l l0 and LF
[01:10:06] so three hyper parameters l l0 and LF that makes sense
[01:10:08] that makes sense okay so this is for transfer learning so
[01:10:11] okay so this is for transfer learning so it looks more complicated than the
[01:10:13] it looks more complicated than the question the question was more
[01:10:15] question the question was more complicated and it looked like okay
[01:10:19] complicated and it looked like okay let's move where am i okay let's go over
[01:10:24] let's move where am i okay let's go over another question okay so this we did it
[01:10:32] another question okay so this we did it now it's interesting because here we
[01:10:35] now it's interesting because here we have an input image and in the middle we
[01:10:37] have an input image and in the middle we have the output that the doctor would
[01:10:38] have the output that the doctor would like but on the right you have the
[01:10:42] like but on the right you have the output of your algorithm so you see that
[01:10:44] output of your algorithm so you see that there is a difference between what they
[01:10:46] there is a difference between what they want and what we're producing and it
[01:10:49] want and what we're producing and it goes back to someone mentioned it
[01:10:51] goes back to someone mentioned it earlier there is a problem here how do
[01:10:53] earlier there is a problem here how do you think you can correct the model
[01:10:54] you think you can correct the model and/or the data set to satisfy the
[01:10:57] and/or the data set to satisfy the doctor's request so the issue with with
[01:10:59] doctor's request so the issue with with this image is that they want to be able
[01:11:01] this image is that they want to be able to separate the cells among them and
[01:11:03] to separate the cells among them and they cannot do it based on your
[01:11:04] they cannot do it based on your algorithm it's still little hard there
[01:11:06] algorithm it's still little hard there is there's something to add so can
[01:11:09] is there's something to add so can someone come up with the answer or do
[01:11:11] someone come up with the answer or do you want to explain actually you mention
[01:11:12] you want to explain actually you mention one of the answers so that we we can
[01:11:14] one of the answers so that we we can finish this lecture
[01:11:17] now it looks like you could have like
[01:11:19] now it looks like you could have like three cells on the bottom left blurring
[01:11:21] three cells on the bottom left blurring it together it and so if you ask for
[01:11:24] it together it and so if you ask for adding boundaries it makes the cells
[01:11:25] adding boundaries it makes the cells more well-defined good answer so one way
[01:11:28] more well-defined good answer so one way is when you label your datasets
[01:11:30] is when you label your datasets originally you label it with zeros and
[01:11:32] originally you label it with zeros and ones for every pixel now instead you
[01:11:35] ones for every pixel now instead you will label with three classes zero one
[01:11:38] will label with three classes zero one or boundary likes let's say 0 1 2 4
[01:11:42] or boundary likes let's say 0 1 2 4 boundary or even the best method I would
[01:11:44] boundary or even the best method I would say is that for each pixel for each
[01:11:47] say is that for each pixel for each input pixel the output will be the
[01:11:51] input pixel the output will be the corresponding okay this one is not good
[01:11:55] corresponding okay this one is not good the corresponding label like this is a
[01:11:57] the corresponding label like this is a sell picture he off sell P of boundary
[01:12:03] sell picture he off sell P of boundary and P of no sell what you will do is
[01:12:09] and P of no sell what you will do is that instead of having a sigmoid
[01:12:11] that instead of having a sigmoid activation you will use a soft max
[01:12:12] activation you will use a soft max actuation okay and the soft max will be
[01:12:17] actuation okay and the soft max will be for a pixel one other way to do that if
[01:12:21] for a pixel one other way to do that if it still didn't work doesn't work even
[01:12:23] it still didn't work doesn't work even if you label the boundaries what is
[01:12:25] if you label the boundaries what is another way to do that
[01:12:27] another way to do that Yury label your datasets by taking into
[01:12:31] Yury label your datasets by taking into account the boundaries the model still
[01:12:33] account the boundaries the model still doesn't perform what I think it's all
[01:12:39] doesn't perform what I think it's all about the waiting of the last function
[01:12:41] about the waiting of the last function it's likely that the number of pixels
[01:12:43] it's likely that the number of pixels that our boundaries are going to be
[01:12:44] that our boundaries are going to be fewer than the number of pixels that
[01:12:45] fewer than the number of pixels that ourselves or no cells so the network
[01:12:48] ourselves or no cells so the network will be biased towards predicting cell
[01:12:50] will be biased towards predicting cell or no cell instead what you can do is
[01:12:52] or no cell instead what you can do is when you compute your loss function your
[01:12:55] when you compute your loss function your loss function should have three terms
[01:12:56] loss function should have three terms one binary cross entropy let's say for
[01:12:59] one binary cross entropy let's say for no cell 1 4 cell and one 4 boundary okay
[01:13:09] no cell 1 4 cell and one 4 boundary okay and this is going to be summed over I
[01:13:13] and this is going to be summed over I equals 1 to N I the whole output pixel
[01:13:19] equals 1 to N I the whole output pixel values what you can do is to atribute a
[01:13:21] values what you can do is to atribute a coefficient to each of those alpha beta
[01:13:25] coefficient to each of those alpha beta or 1 and by tweaking its coefficient if
[01:13:28] or 1 and by tweaking its coefficient if you put a very high a very low number
[01:13:29] you put a very high a very low number here in the
[01:13:30] here in the it means you're telling your model to
[01:13:32] it means you're telling your model to focus on the boundary you're telling the
[01:13:34] focus on the boundary you're telling the model model that if you miss the
[01:13:35] model model that if you miss the boundary it's a huge penalty we want you
[01:13:38] boundary it's a huge penalty we want you to train by figuring out all the
[01:13:39] to train by figuring out all the boundaries that's another trick that you
[01:13:41] boundaries that's another trick that you could use one question on that yeah good
[01:13:51] could use one question on that yeah good question what do I mean by really really
[01:13:53] question what do I mean by really really willing your dataset this this last try
[01:13:55] willing your dataset this this last try this section you've been labeling
[01:13:56] this section you've been labeling bounding boxes you know for the yellow
[01:13:59] bounding boxes you know for the yellow algorithm so the same tools are
[01:14:01] algorithm so the same tools are available for segmentation where you
[01:14:02] available for segmentation where you have an image and you would draw the
[01:14:04] have an image and you would draw the different lines in practice if the more
[01:14:08] different lines in practice if the more if the tool that you were using the line
[01:14:10] if the tool that you were using the line used will just count as a cell
[01:14:12] used will just count as a cell everything including the language with
[01:14:14] everything including the language with everything inside what you draw plus the
[01:14:17] everything inside what you draw plus the boundary would count as cell and the
[01:14:19] boundary would count as cell and the rest has no cell it's just a line of
[01:14:21] rest has no cell it's just a line of code to make it different the line you
[01:14:22] code to make it different the line you drew will count as boundary everything
[01:14:25] drew will count as boundary everything inside will count as cell and everything
[01:14:28] inside will count as cell and everything outside would count as no cell so it's
[01:14:30] outside would count as no cell so it's the way you use your labeling tool
[01:14:31] the way you use your labeling tool that's all I think it's not learnable
[01:14:41] that's all I think it's not learnable parameters it's more hyperparameters to
[01:14:43] parameters it's more hyperparameters to tune you know the same way you tune
[01:14:45] tune you know the same way you tune lambda for your regularization you would
[01:14:47] lambda for your regularization you would tune alpha and beta so when you make it
[01:14:53] so this is not an attention mechanism
[01:14:56] so this is not an attention mechanism because it's just a training trick I
[01:14:58] because it's just a training trick I would say you cannot know how much
[01:15:00] would say you cannot know how much attention we tell you for each image how
[01:15:03] attention we tell you for each image how much the model is looking at this board
[01:15:04] much the model is looking at this board versus that part this is not going to
[01:15:06] versus that part this is not going to tell you that it's just a training trick
[01:15:10] what's the advantage to doing it this
[01:15:12] what's the advantage to doing it this way suppose like object detection like
[01:15:15] way suppose like object detection like the tank thing so the question is what's
[01:15:19] the tank thing so the question is what's the advantage of the segmentation rather
[01:15:21] the advantage of the segmentation rather than detection yeah so detection means
[01:15:24] than detection yeah so detection means you want to output a bounding box if you
[01:15:26] you want to output a bounding box if you output a bounding box what you could do
[01:15:28] output a bounding box what you could do is output the bounding box crop it out
[01:15:30] is output the bounding box crop it out and then analyze the cell and try to
[01:15:32] and then analyze the cell and try to find the contour of the cell but if you
[01:15:35] find the contour of the cell but if you want to separate the cells if you want
[01:15:39] want to separate the cells if you want to be very precise segmentation is going
[01:15:40] to be very precise segmentation is going to work well if you want to be very fast
[01:15:42] to work well if you want to be very fast bounding boxes would work better and you
[01:15:44] bounding boxes would work better and you guys too
[01:15:44] guys too the way segmentation is not working as
[01:15:47] the way segmentation is not working as fast as the yo logarithm works for
[01:15:49] fast as the yo logarithm works for object detection yeah I would say that
[01:15:52] object detection yeah I would say that but it's more much more precise
[01:15:55] but it's more much more precise okay so modify the datasets in order to
[01:15:59] okay so modify the datasets in order to label the boundaries on top of that you
[01:16:01] label the boundaries on top of that you can change the last function to give
[01:16:02] can change the last function to give more weights to boundaries or penalize
[01:16:04] more weights to boundaries or penalize false positives
[01:16:06] false positives okay we have one more slide I think so
[01:16:10] okay we have one more slide I think so let's go over it so now the doctors they
[01:16:13] let's go over it so now the doctors they give you a new data sets that contain
[01:16:14] give you a new data sets that contain images similar to the previous ones the
[01:16:20] images similar to the previous ones the difference is that each mean image now
[01:16:21] difference is that each mean image now is labeled with zero and one zero
[01:16:23] is labeled with zero and one zero meaning there are no cancer cells on
[01:16:26] meaning there are no cancer cells on that image and one means there is at
[01:16:28] that image and one means there is at least a cancer cell on this image so
[01:16:31] least a cancer cell on this image so we're not doing segmentation anymore
[01:16:32] we're not doing segmentation anymore it's a binary classification image
[01:16:34] it's a binary classification image cancer or no cancer okay so you easily
[01:16:39] cancer or no cancer okay so you easily build the state-of-the-art model because
[01:16:40] build the state-of-the-art model because you're you're a very strong person in
[01:16:42] you're you're a very strong person in classification and you achieve 99%
[01:16:46] classification and you achieve 99% accuracy the doctors are super happy and
[01:16:49] accuracy the doctors are super happy and they ask you to explain the networks
[01:16:51] they ask you to explain the networks prediction so given an image classified
[01:16:56] prediction so given an image classified as one how can you figure out based on
[01:16:58] as one how can you figure out based on which cell the model predicts one
[01:17:00] which cell the model predicts one soprana I've talked a little bit about
[01:17:02] soprana I've talked a little bit about that there are other methods that you
[01:17:04] that there are other methods that you should be able to figure out right now
[01:17:05] should be able to figure out right now even if you don't know class activation
[01:17:07] even if you don't know class activation Maps so to sum it up we have an image
[01:17:15] Maps so to sum it up we have an image input image put it in your new network
[01:17:18] input image put it in your new network that is a binary classifier and the
[01:17:24] that is a binary classifier and the network says one you want to figure out
[01:17:27] network says one you want to figure out why the network size one based on which
[01:17:29] why the network size one based on which pixels what do you do
[01:17:35] huh visualize ways visualize the weight
[01:17:41] huh visualize ways visualize the weight what do you visualize in the words so I
[01:17:46] what do you visualize in the words so I think visualizing the weights is not
[01:17:49] think visualizing the weights is not related to the input the weights are not
[01:17:51] related to the input the weights are not gonna change based on the input so here
[01:17:53] gonna change based on the input so here you want to know why this inputs led to
[01:17:56] you want to know why this inputs led to one so it's not about the weight good
[01:18:03] one so it's not about the weight good idea so you know after you get the one
[01:18:06] idea so you know after you get the one here this is why hats
[01:18:08] here this is why hats basically it's not exactly one let's say
[01:18:09] basically it's not exactly one let's say it's point seven probability what you
[01:18:13] it's point seven probability what you can remember is that this number
[01:18:15] can remember is that this number derivative of Y hat with respect to X is
[01:18:18] derivative of Y hat with respect to X is what it's a matrix of shape same as X
[01:18:22] what it's a matrix of shape same as X you know two matrix and each entry of
[01:18:28] you know two matrix and each entry of the matrix is telling you how much
[01:18:31] the matrix is telling you how much moving this pixel influences Y hat you
[01:18:35] moving this pixel influences Y hat you agree so the top left number here is
[01:18:38] agree so the top left number here is telling you how much X 1 is impacting Y
[01:18:44] telling you how much X 1 is impacting Y hat is it or not maybe it's not if you
[01:18:47] hat is it or not maybe it's not if you have a card detector and the cat is here
[01:18:49] have a card detector and the cat is here you can change this this pixel it's
[01:18:51] you can change this this pixel it's never gonna change anything
[01:18:52] never gonna change anything so the value here is going to be very
[01:18:54] so the value here is going to be very small closer to zero let's assume the
[01:18:56] small closer to zero let's assume the cancer cell is here you will see high
[01:18:59] cancer cell is here you will see high number in this part of the matrix
[01:19:01] number in this part of the matrix because this this these are the pixel
[01:19:03] because this this these are the pixel that if we move them it will change Y
[01:19:05] that if we move them it will change Y hat does it make sense it's a quick way
[01:19:07] hat does it make sense it's a quick way to interpret your network it doesn't
[01:19:09] to interpret your network it doesn't it's not too too good like you're not
[01:19:13] it's not too too good like you're not gonna have tremendous results but you
[01:19:14] gonna have tremendous results but you should see these pixels have higher
[01:19:16] should see these pixels have higher derivative values than the others okay
[01:19:20] derivative values than the others okay that's one way and then we will see in
[01:19:21] that's one way and then we will see in two weeks how to interpret neural
[01:19:24] two weeks how to interpret neural networks visualizing the weights
[01:19:25] networks visualizing the weights included and all the other methods okay
[01:19:31] included and all the other methods okay so gradient with respect to your model
[01:19:33] so gradient with respect to your model detects cancer cells from the test set
[01:19:37] detects cancer cells from the test set images with 99% accuracy while a doctor
[01:19:40] images with 99% accuracy while a doctor would on average perform 97% on the same
[01:19:43] would on average perform 97% on the same task is this possible or not
[01:19:47] who thinks it's possible to have a
[01:19:50] who thinks it's possible to have a network that achieves more accuracy on
[01:19:52] network that achieves more accuracy on the test set than the doctor okay can
[01:19:55] the test set than the doctor okay can someone can someone say why you have an
[01:20:03] someone can someone say why you have an explanation
[01:20:11] okay the network probably looks at
[01:20:13] okay the network probably looks at complex things that doctor didn't say
[01:20:14] complex things that doctor didn't say they didn't see that's what you're
[01:20:16] they didn't see that's what you're saying possibly I think there's a more
[01:20:22] saying possibly I think there's a more rigorous explanation so here we're
[01:20:30] rigorous explanation so here we're talking about Bayes they're human level
[01:20:32] talking about Bayes they're human level performance and all that stuff that's
[01:20:33] performance and all that stuff that's when you should see it so one thing is
[01:20:35] when you should see it so one thing is that there are many concepts that you
[01:20:36] that there are many concepts that you will see in course tree that are
[01:20:37] will see in course tree that are actually implemented in the industry but
[01:20:39] actually implemented in the industry but it's it's not because you know them that
[01:20:41] it's it's not because you know them that you're going to understand that it's
[01:20:44] you're going to understand that it's time to use them and that's what we want
[01:20:46] time to use them and that's what we want you to get to like now when I ask you
[01:20:47] you to get to like now when I ask you this question you have to talk think
[01:20:49] this question you have to talk think about pacer human level accuracy and so
[01:20:51] about pacer human level accuracy and so on so the question that you should ask
[01:20:53] on so the question that you should ask here is what was the data set labeled
[01:20:57] here is what was the data set labeled what were the labels coming from if the
[01:21:00] what were the labels coming from if the data set was labeled by individual
[01:21:02] data set was labeled by individual doctors I think that looks weird like if
[01:21:05] doctors I think that looks weird like if it was labeled by individual doctors I
[01:21:07] it was labeled by individual doctors I think it's very weird that the model
[01:21:09] think it's very weird that the model performs better on the test set then
[01:21:12] performs better on the test set then what doctors have labeled because simply
[01:21:14] what doctors have labeled because simply because the labels are wrong three
[01:21:16] because the labels are wrong three percent of the time on average the
[01:21:18] percent of the time on average the labels are wrong so you're you're
[01:21:19] labels are wrong so you're you're teaching wrong things to your model
[01:21:21] teaching wrong things to your model three percent of the time so it's
[01:21:23] three percent of the time so it's surprising that it gets better could
[01:21:24] surprising that it gets better could happen but surprising but if every
[01:21:28] happen but surprising but if every single image of the dates that has been
[01:21:29] single image of the dates that has been labeled by a group of doctors as pronoun
[01:21:31] labeled by a group of doctors as pronoun I've talked about it then the average
[01:21:33] I've talked about it then the average accuracy of this group of doctor is
[01:21:36] accuracy of this group of doctor is probably higher than one doctor
[01:21:37] probably higher than one doctor maybe it's 99 percent in which case it
[01:21:40] maybe it's 99 percent in which case it makes sense that the model can beat one
[01:21:41] makes sense that the model can beat one doctor this is make sense so you have a
[01:21:44] doctor this is make sense so you have a sir you're trying to approximate with
[01:21:45] sir you're trying to approximate with we'd like the best error you can achieve
[01:21:48] we'd like the best error you can achieve so we're grouping grouping a cluster of
[01:21:51] so we're grouping grouping a cluster of doctors probably better than one doctor
[01:21:53] doctors probably better than one doctor this is your human level performance and
[01:21:55] this is your human level performance and then you should be able to beat one
[01:21:57] then you should be able to beat one doctor it's like
[01:22:00] okay so you want to build a pipeline
[01:22:04] okay so you want to build a pipeline that goes from image taken by the front
[01:22:08] that goes from image taken by the front of your car to a steering direction for
[01:22:14] of your car to a steering direction for autonomous driving what you could do is
[01:22:16] autonomous driving what you could do is that you could send this image to a car
[01:22:19] that you could send this image to a car detector that detects all the cars a
[01:22:23] detector that detects all the cars a pedestrian detector that detects all the
[01:22:29] pedestrian detector that detects all the pedestrians and then you can give it to
[01:22:34] pedestrians and then you can give it to a pass cleaner let's say that cleanse
[01:22:39] a pass cleaner let's say that cleanse the path and outputs the steering
[01:22:41] the path and outputs the steering direction let's say so it's not n to it
[01:22:43] direction let's say so it's not n to it and two n would be I have an input image
[01:22:45] and two n would be I have an input image and I give it I want this output so a
[01:22:49] and I give it I want this output so a few other disadvantages of this is is a
[01:22:54] few other disadvantages of this is is a something can go wrong anywhere in the
[01:22:56] something can go wrong anywhere in the model you know how do you know which
[01:23:00] model you know how do you know which part of the model went wrong can you
[01:23:03] part of the model went wrong can you tell me which part
[01:23:04] tell me which part I'll give you an image the same
[01:23:07] I'll give you an image the same direction is wrong why
[01:23:21] good idea
[01:23:23] good idea looking at the different components so
[01:23:24] looking at the different components so what you can do is look what happens
[01:23:28] what you can do is look what happens here and there look what's happening
[01:23:30] here and there look what's happening here and there you think based on this
[01:23:32] here and there you think based on this image the car detector worked well or
[01:23:34] image the car detector worked well or not you can check it out do you think
[01:23:38] not you can check it out do you think the pedestrian detector worked well not
[01:23:39] the pedestrian detector worked well not you can check it out if there is
[01:23:40] you can check it out if there is something wrong here it's probably one
[01:23:42] something wrong here it's probably one of these two items it doesn't mean this
[01:23:44] of these two items it doesn't mean this one is good it just means that these two
[01:23:45] one is good it just means that these two items are wrong how do you check that
[01:23:48] items are wrong how do you check that this one is good you can label ground
[01:23:50] this one is good you can label ground truth images and give them here as input
[01:23:53] truth images and give them here as input to this one and figure out if it's
[01:23:54] to this one and figure out if it's figuring out the steering direction or
[01:23:56] figuring out the steering direction or not if it is it seems that the path
[01:23:58] not if it is it seems that the path planner is working what if it is not it
[01:24:01] planner is working what if it is not it means there's a problem here now what if
[01:24:04] means there's a problem here now what if every single component seemed to work
[01:24:06] every single component seemed to work properly like let's say these to work
[01:24:09] properly like let's say these to work properly but there is still a problem it
[01:24:14] properly but there is still a problem it might be because what you selected as a
[01:24:16] might be because what you selected as a human was wrong the path standard cannot
[01:24:21] human was wrong the path standard cannot detect cannot get the steering Direction
[01:24:23] detect cannot get the steering Direction correct based on only the pedestrians
[01:24:24] correct based on only the pedestrians and the car detection and the cars
[01:24:26] and the car detection and the cars probably need the stop signs and stuff
[01:24:28] probably need the stop signs and stuff like that this way you know and so
[01:24:30] like that this way you know and so because you made hand engineering
[01:24:31] because you made hand engineering choices here your model might go wrong
[01:24:33] choices here your model might go wrong that's another thing and another
[01:24:36] that's another thing and another advantage of this type of pipeline is
[01:24:40] advantage of this type of pipeline is that data is probably easier to find out
[01:24:43] that data is probably easier to find out at M for every algorithm rather than the
[01:24:45] at M for every algorithm rather than the for whole the whole end-to-end pipeline
[01:24:47] for whole the whole end-to-end pipeline if you want to collect data for the
[01:24:49] if you want to collect data for the entire pipeline you would need to take a
[01:24:50] entire pipeline you would need to take a car put a camera in the front like like
[01:24:55] car put a camera in the front like like build a kind of steering wheel angle
[01:24:58] build a kind of steering wheel angle detector that will measure your ceiling
[01:25:01] detector that will measure your ceiling wheel at every step while you're driving
[01:25:03] wheel at every step while you're driving so you need to drive everywhere
[01:25:05] so you need to drive everywhere basically with a car that has this
[01:25:07] basically with a car that has this feature it's pretty hard you need a lot
[01:25:10] feature it's pretty hard you need a lot of data a lot of roads while this one
[01:25:12] of data a lot of roads while this one you can collect data of images anywhere
[01:25:15] you can collect data of images anywhere and label it's a label the pedestrians
[01:25:18] and label it's a label the pedestrians on it you can detect cars by the same
[01:25:20] on it you can detect cars by the same process okay so these choices also
[01:25:24] process okay so these choices also depend on what data can you access
[01:25:25] depend on what data can you access easily or what data is harder to acquire
[01:25:30] easily or what data is harder to acquire any questions on that you're going to
[01:25:34] any questions on that you're going to learn about convolution on your networks
[01:25:36] learn about convolution on your networks now we're gonna get fun with a lot of
[01:25:38] now we're gonna get fun with a lot of imaging you have a quiz into programming
[01:25:41] imaging you have a quiz into programming assignment for the first module second
[01:25:43] assignment for the first module second module same midterm next Friday not this
[01:25:46] module same midterm next Friday not this one
[01:25:47] one everything up to C for m2 will be
[01:25:50] everything up to C for m2 will be included in the meter so up to the
[01:25:52] included in the meter so up to the videos you're watching this week
[01:25:53] videos you're watching this week includes TA sections and next one and
[01:25:57] includes TA sections and next one and every in-class lecture including next
[01:25:59] every in-class lecture including next Wednesday's in this Friday you have a TA
[01:26:02] Wednesday's in this Friday you have a TA section any questions on that
[01:26:07] section any questions on that okay see you next week guys


================================================================================
LECTURE 006
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 6 - Deep Learning Project Strategy

Source: https://www.youtube.com/watch?v=G5FNYxbW_Qw

---

Transcript

[00:00:05] all right hey everyone welcome back um
[00:00:08] all right hey everyone welcome back um is this people hear me
[00:00:10] is this people hear me okay all right so if as usual you can
[00:00:13] okay all right so if as usual you can take a second to enter your uh Su ID so
[00:00:16] take a second to enter your uh Su ID so we know who's
[00:00:17] we know who's here um so today's lecture will be a
[00:00:22] here um so today's lecture will be a Choose Your Own Adventure lecture um so
[00:00:26] Choose Your Own Adventure lecture um so I think you know by now you've learned a
[00:00:28] I think you know by now you've learned a lot about the um Tech iCal aspects of
[00:00:30] lot about the um Tech iCal aspects of building learning algorithms and then in
[00:00:33] building learning algorithms and then in the third course uh in the third set of
[00:00:36] the third course uh in the third set of modules you saw some of the principles
[00:00:38] modules you saw some of the principles for debuging learning algorithms and how
[00:00:40] for debuging learning algorithms and how to actually use these tools um in order
[00:00:43] to actually use these tools um in order to be efficient in how you build a
[00:00:45] to be efficient in how you build a machine learning application what I want
[00:00:47] machine learning application what I want to do today is uh step through with you
[00:00:50] to do today is uh step through with you a moderately complicated machine
[00:00:53] a moderately complicated machine learning application and um throughout
[00:00:57] learning application and um throughout all of today's lecture I'm going to you
[00:00:59] all of today's lecture I'm going to you know step you through a scenario and
[00:01:01] know step you through a scenario and then ask you to kind of Choose Your Own
[00:01:04] then ask you to kind of Choose Your Own Adventure because if you're working on
[00:01:05] Adventure because if you're working on this project what are you going to do
[00:01:07] this project what are you going to do right um and to give you more of that
[00:01:09] right um and to give you more of that practice in the next um what hour and a
[00:01:12] practice in the next um what hour and a bit that we have uh on thinking through
[00:01:15] bit that we have uh on thinking through machine learning
[00:01:17] machine learning strategy um and you know I've seen in so
[00:01:21] strategy um and you know I've seen in so many projects uh there there sometimes
[00:01:24] many projects uh there there sometimes things that a less strategically
[00:01:27] things that a less strategically sophisticated team will take a year to
[00:01:29] sophisticated team will take a year to do
[00:01:30] do but if you're actually very strategic
[00:01:32] but if you're actually very strategic and very sophisticated in deciding what
[00:01:34] and very sophisticated in deciding what you will do next right how the drive a
[00:01:36] you will do next right how the drive a project forward I've seen many times
[00:01:39] project forward I've seen many times that what a different team will take to
[00:01:41] that what a different team will take to do maybe you could do it in a month or
[00:01:43] do maybe you could do it in a month or two right and you know if you're trying
[00:01:46] two right and you know if you're trying to um I don't know write a research
[00:01:49] to um I don't know write a research paper or build a business or build a
[00:01:50] paper or build a business or build a product the the the ability to drive a
[00:01:53] product the the the ability to drive a machine learning project quickly gives
[00:01:55] machine learning project quickly gives you a huge advantage and just you know
[00:01:57] you a huge advantage and just you know you're Mak it much more efficient use of
[00:01:58] you're Mak it much more efficient use of your life as well right
[00:02:00] your life as well right um so in for today I'd like to uh uh I'm
[00:02:04] um so in for today I'd like to uh uh I'm going to posee a scenario pose a machine
[00:02:06] going to posee a scenario pose a machine learning application and say all right I
[00:02:09] learning application and say all right I mean you are the CEO of this project
[00:02:11] mean you are the CEO of this project what are you going to do next so but I'd
[00:02:12] what are you going to do next so but I'd like to have today's meeting be quite
[00:02:15] like to have today's meeting be quite interactive as well so can I get people
[00:02:17] interactive as well so can I get people to sit in groups of two in ideally three
[00:02:20] to sit in groups of two in ideally three or so maybe plus minus one and um try to
[00:02:23] or so maybe plus minus one and um try to sit next to someone that you don't work
[00:02:25] sit next to someone that you don't work with all the time uh so so if you're
[00:02:27] with all the time uh so so if you're sitting sitting next to your best friend
[00:02:29] sitting sitting next to your best friend I'm glad your best friend is in the
[00:02:30] I'm glad your best friend is in the class with you but go sit with someone
[00:02:33] class with you but go sit with someone else because I think um I've done this
[00:02:35] else because I think um I've done this multiple times and the discussion is
[00:02:36] multiple times and the discussion is actually richer if you talk to someone
[00:02:38] actually richer if you talk to someone that you don't know super
[00:02:40] that you don't know super well so actually take a second introduce
[00:02:43] well so actually take a second introduce yourself and and and and just reach your
[00:02:46] yourself and and and and just reach your neighbor I guess so the example I want
[00:02:48] neighbor I guess so the example I want to go through today is actually a
[00:02:51] to go through today is actually a continuation of the example I described
[00:02:54] continuation of the example I described briefly uh uh in in the last lecture I
[00:02:56] briefly uh uh in in the last lecture I taught building a speech recognition
[00:02:58] taught building a speech recognition system right so remember I briefly um
[00:03:02] system right so remember I briefly um multiv this uh trigger word wake word or
[00:03:05] multiv this uh trigger word wake word or trigger word detection system last time
[00:03:08] trigger word detection system last time where you know uh right I I I I actually
[00:03:13] where you know uh right I I I I actually have both an Amazon Echo and a Google
[00:03:15] have both an Amazon Echo and a Google home uh but you know it's it's a lot of
[00:03:18] home uh but you know it's it's a lot of work to configure these things to turn
[00:03:20] work to configure these things to turn on and off your light bulbs um and so if
[00:03:23] on and off your light bulbs um and so if you can build a chip uh to sell to say a
[00:03:26] you can build a chip uh to sell to say a lamp maker to recognize uh phrases like
[00:03:31] lamp maker to recognize uh phrases like you know let's say we call the lamp
[00:03:32] you know let's say we call the lamp robit right um then you can recognize
[00:03:35] robit right um then you can recognize phrases like robit turn on right robit
[00:03:39] phrases like robit turn on right robit turn off and you have a little switch to
[00:03:42] turn off and you have a little switch to give this thing different names you call
[00:03:44] give this thing different names you call robits or uh ler or Alice or something
[00:03:47] robits or uh ler or Alice or something you can also have lener turn on Lena
[00:03:49] you can also have lener turn on Lena turn off and give give your lamp a name
[00:03:51] turn off and give give your lamp a name and just say hey Robert turn on right so
[00:03:54] and just say hey Robert turn on right so rather than detecting different names
[00:03:56] rather than detecting different names and turn on and turn off I'm just going
[00:03:57] and turn on and turn off I'm just going to focus on just with a technical
[00:04:00] to focus on just with a technical discussion I'm just going to focus on
[00:04:02] discussion I'm just going to focus on the phrase Robert turn on uh but it's
[00:04:05] the phrase Robert turn on uh but it's kind of the same problem you need to
[00:04:07] kind of the same problem you need to solve like four times to give it two
[00:04:08] solve like four times to give it two names or to turn on and turn off so I'm
[00:04:10] names or to turn on and turn off so I'm going to abbreviate Robert turn on as
[00:04:13] going to abbreviate Robert turn on as RTO right if you want to call your name
[00:04:15] RTO right if you want to call your name Rob it and um uh tell your lamp to turn
[00:04:19] Rob it and um uh tell your lamp to turn on um I think I was inspired by well
[00:04:21] on um I think I was inspired by well Isaac azimov wrote these um robotic
[00:04:24] Isaac azimov wrote these um robotic novel series and and all his robots Nam
[00:04:26] novel series and and all his robots Nam started with r so maybe R's robot turn
[00:04:29] started with r so maybe R's robot turn on
[00:04:30] on um and so
[00:04:33] um and so uh let's see so let's say that um you
[00:04:36] uh let's see so let's say that um you are the new CEO of a small startup with
[00:04:41] are the new CEO of a small startup with you know three persons uh and your goal
[00:04:44] you know three persons uh and your goal is to build an is is to build a circuit
[00:04:47] is to build an is is to build a circuit oh actually your goal is to build a
[00:04:48] oh actually your goal is to build a learning algorithm um that can recognize
[00:04:51] learning algorithm um that can recognize this phrase robit turn on uh so that
[00:04:54] this phrase robit turn on uh so that when someone you know buys this lamp and
[00:04:56] when someone you know buys this lamp and they say robit turn on that the lamp can
[00:04:58] they say robit turn on that the lamp can turn on right and just focusing on the
[00:05:01] turn on right and just focusing on the task of building a and then you know to
[00:05:03] task of building a and then you know to to to be SE of this St you need to do a
[00:05:05] to to be SE of this St you need to do a lot of things right you need to figure
[00:05:06] lot of things right you need to figure out how to do the embeded cry figure out
[00:05:08] out how to do the embeded cry figure out who the land makers the sales so there's
[00:05:10] who the land makers the sales so there's all that stuff but for today let's just
[00:05:12] all that stuff but for today let's just focus on the machine learning aspect of
[00:05:14] focus on the machine learning aspect of it um and so my first question to you is
[00:05:18] it um and so my first question to you is very open-ended is but and and this is
[00:05:20] very open-ended is but and and this is the life of a CEO right you wake up one
[00:05:22] the life of a CEO right you wake up one day and you just got to decide what to
[00:05:23] day and you just got to decide what to do um but so my first question to you is
[00:05:26] do um but so my first question to you is open-ended question is uh you're the co
[00:05:29] open-ended question is uh you're the co uh you're going to show up at work uh
[00:05:32] uh you're going to show up at work uh you know tomorrow in your startup office
[00:05:34] you know tomorrow in your startup office and you want to build a learning
[00:05:36] and you want to build a learning algorithm to detect the phrase Robert
[00:05:38] algorithm to detect the phrase Robert turn on for this application right so um
[00:05:42] turn on for this application right so um so my question is what are you going to
[00:05:44] so my question is what are you going to do right so take a take a minute answer
[00:05:46] do right so take a take a minute answer that by yourself first uh no don't don't
[00:05:49] that by yourself first uh no don't don't discuss your neighbor yet but you know
[00:05:50] discuss your neighbor yet but you know you're going to show up in your office
[00:05:51] you're going to show up in your office and and then you're going to start
[00:05:52] and and then you're going to start working on this enging problem to build
[00:05:54] working on this enging problem to build a new network to do this so uh and and
[00:05:57] a new network to do this so uh and and do this as yourself right don't don't
[00:05:59] do this as yourself right don't don't don't pretend that you're this
[00:06:01] don't pretend that you're this hypothetical
[00:06:03] hypothetical whatever startup SE with 10 10 billion
[00:06:06] whatever startup SE with 10 10 billion dollar to spend whatever just do it as
[00:06:08] dollar to spend whatever just do it as let's say yeah I but I I don't think
[00:06:10] let's say yeah I but I I don't think this is a terrible startup idea I I this
[00:06:13] this is a terrible startup idea I I this is not the best idea but I think this
[00:06:14] is not the best idea but I think this could work so you actually welcome to do
[00:06:16] could work so you actually welcome to do this but let's say you decide to do this
[00:06:18] this but let's say you decide to do this and you go into your office tomorrow
[00:06:20] and you go into your office tomorrow like what do you do right why don't you
[00:06:21] like what do you do right why don't you take um why don't you take let's say two
[00:06:25] take um why don't you take let's say two minutes to enter an answer then we can
[00:06:26] minutes to enter an answer then we can then we can discuss in fact I I think um
[00:06:31] then we can discuss in fact I I think um yeah yes one thing I really like about
[00:06:33] yeah yes one thing I really like about answer is actually the uh V is existing
[00:06:36] answer is actually the uh V is existing literature part right um in fact when
[00:06:38] literature part right um in fact when you're sting a new project um uh uh and
[00:06:42] you're sting a new project um uh uh and I think um uh when you're sting a new
[00:06:45] I think um uh when you're sting a new project like that assuming you've not
[00:06:46] project like that assuming you've not worked on trigger word detection before
[00:06:48] worked on trigger word detection before you know reading research papers or
[00:06:50] you know reading research papers or reading cod in GitHub or reading blog
[00:06:52] reading cod in GitHub or reading blog post on this problem actually very good
[00:06:53] post on this problem actually very good way to quickly level up your knowledge
[00:06:56] way to quickly level up your knowledge um and I think that you know it it turns
[00:06:59] um and I think that you know it it turns that um uh in terms of your uh
[00:07:03] that um uh in terms of your uh exploration strategy right um I want to
[00:07:06] exploration strategy right um I want to describe to you how I read research
[00:07:09] describe to you how I read research papers um uh which is so this is um not
[00:07:14] papers um uh which is so this is um not a good way to review the literature
[00:07:17] a good way to review the literature which is if the x-axis is time and the
[00:07:20] which is if the x-axis is time and the vertical axis is research papers what
[00:07:23] vertical axis is research papers what some people will do is find the first
[00:07:24] some people will do is find the first research paper and read that until it's
[00:07:28] research paper and read that until it's done and then go and find the second
[00:07:30] done and then go and find the second research paper and read that until it's
[00:07:32] research paper and read that until it's done and then go and find the third
[00:07:34] done and then go and find the third research paper and just has this very
[00:07:35] research paper and just has this very sequential way of um reading research
[00:07:38] sequential way of um reading research papers and I find that the more
[00:07:40] papers and I find that the more strategic way to to go through these
[00:07:42] strategic way to to go through these resources everything ranging from block
[00:07:45] resources everything ranging from block po um lots of good medium articles that
[00:07:47] po um lots of good medium articles that explain things right uh research
[00:07:51] explain things right uh research papers um right good Hub is if you use a
[00:07:56] papers um right good Hub is if you use a parallel exploration process where this
[00:07:59] parallel exploration process where this this is actually what it feels like when
[00:08:00] this is actually what it feels like when I'm doing research on when I'm trying to
[00:08:02] I'm doing research on when I'm trying to learn about a new field that I'm not
[00:08:03] learn about a new field that I'm not that experted in right so I've actually
[00:08:05] that experted in right so I've actually done a lot of work on trigger word
[00:08:06] done a lot of work on trigger word detection but if I hadn't worked on this
[00:08:08] detection but if I hadn't worked on this before then I would probably find you
[00:08:10] before then I would probably find you know three papers so again x- axis is
[00:08:12] know three papers so again x- axis is time and vertical AIS is different
[00:08:14] time and vertical AIS is different papers and um you know read a few papers
[00:08:18] papers and um you know read a few papers kind of in parallel at a surface level
[00:08:20] kind of in parallel at a surface level and skim them and based on that you
[00:08:23] and skim them and based on that you might decide to read that one in Greater
[00:08:25] might decide to read that one in Greater detail and then to add other papers that
[00:08:28] detail and then to add other papers that you start skimming
[00:08:30] you start skimming and maybe find another one that you want
[00:08:31] and maybe find another one that you want to read in great detail and then to
[00:08:33] to read in great detail and then to gradually add new papers to your reading
[00:08:36] gradually add new papers to your reading list uh and read some to confusion and
[00:08:39] list uh and read some to confusion and some not to confusion um yeah I was
[00:08:42] some not to confusion um yeah I was actually chatting with um uh uh one of
[00:08:45] actually chatting with um uh uh one of my friends petb a former student uh at
[00:08:47] my friends petb a former student uh at Berkeley who mentioned that he was
[00:08:49] Berkeley who mentioned that he was wanting to learn about a new topic and
[00:08:51] wanting to learn about a new topic and he he was uh he told me he was compiling
[00:08:53] he he was uh he told me he was compiling a reading list of 200 research papers
[00:08:56] a reading list of 200 research papers they want to read that sounds like a lot
[00:08:57] they want to read that sounds like a lot you you rarely read 200 papers but so I
[00:08:59] you you rarely read 200 papers but so I think if you read 10 papers you have a
[00:09:02] think if you read 10 papers you have a basic understanding if you read 50 you
[00:09:04] basic understanding if you read 50 you have a pretty decent understanding and
[00:09:06] have a pretty decent understanding and if you read like a 100 I think you have
[00:09:07] if you read like a 100 I think you have a very good understanding uh of a few
[00:09:11] a very good understanding uh of a few but often this is time well spent I
[00:09:14] but often this is time well spent I guess um and uh some other tips again
[00:09:18] guess um and uh some other tips again this is I'm really thinking if you
[00:09:20] this is I'm really thinking if you really are CEO of this startup and this
[00:09:22] really are CEO of this startup and this is what you want to do what advice would
[00:09:24] is what you want to do what advice would I give you um uh uh when you're reading
[00:09:27] I give you um uh uh when you're reading papers uh other things to realize uh one
[00:09:30] papers uh other things to realize uh one is that uh some papers don't make sense
[00:09:33] is that uh some papers don't make sense right and that's fine uh uh you know
[00:09:35] right and that's fine uh uh you know even I read some papers I just go no I
[00:09:37] even I read some papers I just go no I don't think that makes sense uh and and
[00:09:39] don't think that makes sense uh and and it's not that uncommon for us to uh find
[00:09:42] it's not that uncommon for us to uh find papers from a decade ago that and we
[00:09:45] papers from a decade ago that and we learned that half of it was great and
[00:09:46] learned that half of it was great and the other half of it you know was really
[00:09:48] the other half of it you know was really talk about things that were not that
[00:09:50] talk about things that were not that important right so it's okay uh authors
[00:09:53] important right so it's okay uh authors you know usually papers are technically
[00:09:55] you know usually papers are technically accurate but often what they thought was
[00:09:57] accurate but often what they thought was important like maybe an author thought
[00:09:59] important like maybe an author thought that using Bashon was really important
[00:10:01] that using Bashon was really important for this problem but it just turns out
[00:10:02] for this problem but it just turns out not to be the case that that happens a
[00:10:04] not to be the case that that happens a lot that happens sometimes um and I
[00:10:06] lot that happens sometimes um and I think the other tactic that I see
[00:10:08] think the other tactic that I see Stanford students sometimes not use
[00:10:10] Stanford students sometimes not use enough is uh talking to experts
[00:10:13] enough is uh talking to experts including contacting the authors so when
[00:10:15] including contacting the authors so when I read the paper um uh I don't I I don't
[00:10:19] I read the paper um uh I don't I I don't bother the authors unless I've actually
[00:10:21] bother the authors unless I've actually like tried to figure it out myself right
[00:10:23] like tried to figure it out myself right but if you actually spend some time
[00:10:25] but if you actually spend some time trying to understand the paper and if it
[00:10:27] trying to understand the paper and if it really doesn't make sense to you uh uh
[00:10:29] really doesn't make sense to you uh uh uh is is is okay to email the authors
[00:10:32] uh is is is okay to email the authors and see if they respond and and people
[00:10:34] and see if they respond and and people are busy maybe there's a 50% chance of
[00:10:36] are busy maybe there's a 50% chance of respond and that's okay because it takes
[00:10:38] respond and that's okay because it takes you five minutes to write an email and
[00:10:39] you five minutes to write an email and there's a 50% chance to get back to you
[00:10:41] there's a 50% chance to get back to you that could be time pretty well spent uh
[00:10:44] that could be time pretty well spent uh uh but but don't don't don't bother
[00:10:46] uh but but don't don't don't bother people unless you try to do your own
[00:10:47] people unless you try to do your own work I actually get a lot of emails from
[00:10:50] work I actually get a lot of emails from you know high school students that that
[00:10:51] you know high school students that that do not feel like they've done their own
[00:10:53] do not feel like they've done their own work and and I just right and then right
[00:10:56] work and and I just right and then right so so just don't don't don't bother
[00:10:57] so so just don't don't don't bother people unless you've actually tried to
[00:11:03] um cool so after um looking at the
[00:11:08] um cool so after um looking at the literature uh and having a base maybe
[00:11:11] literature uh and having a base maybe downloading a open source implementation
[00:11:14] downloading a open source implementation or getting a sense of an Al you want to
[00:11:15] or getting a sense of an Al you want to try oh and it turns out the trigger word
[00:11:18] try oh and it turns out the trigger word detection literature is actually one
[00:11:19] detection literature is actually one literature where there isn't consensus
[00:11:21] literature where there isn't consensus on this is a good Al this is a bad Al
[00:11:23] on this is a good Al this is a bad Al room right despite all the trigger word
[00:11:25] room right despite all the trigger word or wake word detection systems that you
[00:11:27] or wake word detection systems that you know some of you may use already
[00:11:30] know some of you may use already uh there there there isn't actually
[00:11:31] uh there there there isn't actually consensus in the in in the research for
[00:11:34] consensus in the in in the research for me today on like this is the best Aver
[00:11:36] me today on like this is the best Aver to try
[00:11:37] to try um but so let's say that um you read
[00:11:40] um but so let's say that um you read some papers downloaded some open source
[00:11:42] some papers downloaded some open source implementations and now you want to
[00:11:44] implementations and now you want to start training your first system right
[00:11:47] start training your first system right last time we talked about this we talked
[00:11:48] last time we talked about this we talked a little bit about how much time you
[00:11:50] a little bit about how much time you would spend to collect data and and you
[00:11:52] would spend to collect data and and you know we said spend a small amount of
[00:11:54] know we said spend a small amount of time spend like a day or maybe two days
[00:11:56] time spend like a day or maybe two days at most to collect your first data set
[00:11:58] at most to collect your first data set to start training up a model though um
[00:12:00] to start training up a model though um but my next question to you is what data
[00:12:04] but my next question to you is what data would you collect
[00:12:06] would you collect right um in particular what
[00:12:10] right um in particular what train depth test
[00:12:18] data would you collect so you've decided
[00:12:21] data would you collect so you've decided on an initial neuron Network
[00:12:23] on an initial neuron Network architecture and you want to train
[00:12:25] architecture and you want to train something to recognize this space robit
[00:12:27] something to recognize this space robit turn on uh I think there's
[00:12:30] turn on uh I think there's uh probably I don't think it's possible
[00:12:32] uh probably I don't think it's possible to download the data set I don't think
[00:12:33] to download the data set I don't think anyone has collected the data set with
[00:12:35] anyone has collected the data set with the words robit turn on and posted down
[00:12:37] the words robit turn on and posted down on the internet so you have to collect
[00:12:38] on the internet so you have to collect your own data for this particular
[00:12:40] your own data for this particular trigger phrase that you want to use but
[00:12:42] trigger phrase that you want to use but um you know as CEO of this startup
[00:12:45] um you know as CEO of this startup trying to build a neonet to detect the
[00:12:47] trying to build a neonet to detect the phrase robit turn
[00:12:48] phrase robit turn on um what data do you collect right so
[00:12:53] on um what data do you collect right so once you take once you're again take I
[00:12:55] once you take once you're again take I don't know let's say three minutes to
[00:12:57] don't know let's say three minutes to write an answer to this yeah I think
[00:13:01] write an answer to this yeah I think this is an interesting one um Robert
[00:13:04] this is an interesting one um Robert turn on over and over and then data
[00:13:08] turn on over and over and then data augmentation um data augmentation is one
[00:13:11] augmentation um data augmentation is one of those techniques that um uh is a way
[00:13:14] of those techniques that um uh is a way to reduce uh variance in your learning
[00:13:16] to reduce uh variance in your learning ALG because you're generating more data
[00:13:19] ALG because you're generating more data and uh having worked on this problem I
[00:13:22] and uh having worked on this problem I happen to know data augmentation works
[00:13:24] happen to know data augmentation works you know is very useful for this problem
[00:13:26] you know is very useful for this problem but if you didn't already know that fact
[00:13:28] but if you didn't already know that fact this is is one of the things I would
[00:13:30] this is is one of the things I would probably not do right away because I
[00:13:32] probably not do right away because I would train a quick and dirty system
[00:13:34] would train a quick and dirty system validate that you really have a high
[00:13:36] validate that you really have a high variance problem before investing in the
[00:13:38] variance problem before investing in the effort in data augmentation so data
[00:13:40] effort in data augmentation so data augment is one of those techniques that
[00:13:42] augment is one of those techniques that some you know like it never hurts it
[00:13:44] some you know like it never hurts it rarely hurts usually helps but I don't
[00:13:46] rarely hurts usually helps but I don't bother to make that investment unless
[00:13:48] bother to make that investment unless you have collected the evidence that you
[00:13:51] you have collected the evidence that you actually have a high variance problem
[00:13:52] actually have a high variance problem and that this is actually a good use of
[00:13:54] and that this is actually a good use of your time right
[00:14:09] yeah I think this this one actually this
[00:14:13] yeah I think this this one actually this is actually nice so um uh record
[00:14:15] is actually nice so um uh record everyone started say Robert turn 100
[00:14:17] everyone started say Robert turn 100 times the really nice thing about that
[00:14:19] times the really nice thing about that you can get it done really quickly um uh
[00:14:23] you can get it done really quickly um uh when I'm working with teams um I
[00:14:25] when I'm working with teams um I actually think in terms of hours in
[00:14:27] actually think in terms of hours in terms of how long it take us to do do
[00:14:29] terms of how long it take us to do do something so this one you could probably
[00:14:31] something so this one you could probably do in like 30 minutes right so you get
[00:14:33] do in like 30 minutes right so you get your data set collected in 30 minutes
[00:14:35] your data set collected in 30 minutes and get going or or or or if you run
[00:14:37] and get going or or or or if you run around Stanford and just ask you know
[00:14:39] around Stanford and just ask you know friends or strangers to speak into your
[00:14:42] friends or strangers to speak into your uh laptop microphone you spend a few
[00:14:44] uh laptop microphone you spend a few hours to get a much bigger data set than
[00:14:47] hours to get a much bigger data set than possible with startup so I probably do
[00:14:48] possible with startup so I probably do that I probably actually go and collect
[00:14:49] that I probably actually go and collect data in several hours rather than only
[00:14:52] data in several hours rather than only spend 30 minutes but this is actually
[00:14:53] spend 30 minutes but this is actually pretty interesting as well because let
[00:14:54] pretty interesting as well because let you get it done really quickly that make
[00:14:56] you get it done really quickly that make sense right so
[00:15:03] um yeah so let me actually uh uh share
[00:15:07] um yeah so let me actually uh uh share some more concrete advice right and and
[00:15:09] some more concrete advice right and and I think actually some sometime back um
[00:15:11] I think actually some sometime back um to to prepare a homework problem that
[00:15:13] to to prepare a homework problem that you see later in this course Ken and
[00:15:15] you see later in this course Ken and Unis and I we're actually you know
[00:15:17] Unis and I we're actually you know building the system posy to to to create
[00:15:19] building the system posy to to to create a homework right that that that you see
[00:15:21] a homework right that that that you see later in this so this is like a uh this
[00:15:24] later in this so this is like a uh this trigger word I think is a nice running
[00:15:25] trigger word I think is a nice running example that we're using in a few points
[00:15:27] example that we're using in a few points throughout this course um so here's one
[00:15:30] throughout this course um so here's one thing you can do uh and this this is
[00:15:33] thing you can do uh and this this is actually what um uh what we did right
[00:15:36] actually what um uh what we did right which is uh
[00:15:37] which is uh collect um well simplify a little bit
[00:15:41] collect um well simplify a little bit [Music]
[00:15:43] [Music] um collect 100
[00:15:48] examples of uh uh 10-second audio
[00:15:55] clips right and so uh it turns out once
[00:15:58] clips right and so uh it turns out once you grab a hold of someone uh and ask
[00:16:01] you grab a hold of someone uh and ask them to speak into your microphone you
[00:16:03] them to speak into your microphone you know you can keep them for um 3 seconds
[00:16:06] know you can keep them for um 3 seconds which is how long it takes to say Rober
[00:16:08] which is how long it takes to say Rober turn on or you can keep them for 10
[00:16:09] turn on or you can keep them for 10 seconds which they're actually very
[00:16:11] seconds which they're actually very willing to spend an extra seven seconds
[00:16:13] willing to spend an extra seven seconds with you right um but so if this is 10
[00:16:16] with you right um but so if this is 10 seconds of audio data you know so this
[00:16:18] seconds of audio data you know so this is 10 seconds of audio right and and
[00:16:20] is 10 seconds of audio right and and audio is just patterns of little changes
[00:16:23] audio is just patterns of little changes in air pressure right so if you plot
[00:16:24] in air pressure right so if you plot audio the reason it looks like this
[00:16:26] audio the reason it looks like this waveform is just uh the the way you're
[00:16:29] waveform is just uh the the way you're hearing my voice is you know my voice or
[00:16:31] hearing my voice is you know my voice or the speakers are creating very rapid
[00:16:32] the speakers are creating very rapid changes in air pressure and your ear
[00:16:34] changes in air pressure and your ear measures those very rapid changes in air
[00:16:36] measures those very rapid changes in air pressure interprets the sound and so a
[00:16:38] pressure interprets the sound and so a microphone uh is a is a sensitive device
[00:16:41] microphone uh is a is a sensitive device for recording these very very high
[00:16:43] for recording these very very high frequency changes in air pressure and
[00:16:45] frequency changes in air pressure and this plots that you see in audio is just
[00:16:47] this plots that you see in audio is just what is the air pressure at different
[00:16:48] what is the air pressure at different moments in time right but so given a um
[00:16:52] moments in time right but so given a um a 10-second clip like this if this is
[00:16:57] a 10-second clip like this if this is the 3C
[00:16:59] the 3C section where they said Robert turn on
[00:17:03] section where they said Robert turn on then what you would like to do is to
[00:17:04] then what you would like to do is to build a desk slamp say they can sit here
[00:17:08] build a desk slamp say they can sit here and the lamp is turned off turned off
[00:17:10] and the lamp is turned off turned off turn off turn off turn off turn off and
[00:17:14] turn off turn off turn off turn off and at the moment they finish saying Robert
[00:17:16] at the moment they finish saying Robert turn on you know you turn it on so this
[00:17:19] turn on you know you turn it on so this is the output label y really right and
[00:17:23] is the output label y really right and then and then it's not detecting the
[00:17:24] then and then it's not detecting the phas right so so so what you want to do
[00:17:27] phas right so so so what you want to do for the trig word system is
[00:17:29] for the trig word system is at you know pretty much the moment they
[00:17:31] at you know pretty much the moment they finish saying Robert turn on uh you want
[00:17:34] finish saying Robert turn on uh you want your learning algorithm to Output a one
[00:17:37] your learning algorithm to Output a one that's your target label y saying yep I
[00:17:39] that's your target label y saying yep I just heard this trigger word uh and for
[00:17:41] just heard this trigger word uh and for all other times you want it to Output
[00:17:43] all other times you want it to Output zero right because because uh and then
[00:17:46] zero right because because uh and then the one is when you decide to turn on
[00:17:48] the one is when you decide to turn on the lamp at that moment in time right so
[00:17:52] the lamp at that moment in time right so to collect a data set um here's
[00:17:54] to collect a data set um here's something you can do which is collect
[00:17:59] something you can do which is collect 100 audio
[00:18:02] clips of 10 seconds each and you know
[00:18:07] clips of 10 seconds each and you know when I'm prioritizing my work or or my
[00:18:09] when I'm prioritizing my work or or my team's work I would really you know look
[00:18:11] team's work I would really you know look at these numbers and think okay let's
[00:18:13] at these numbers and think okay let's say let's say actually if you're doing
[00:18:15] say let's say actually if you're doing it let's say you're running around
[00:18:16] it let's say you're running around Stanford and you want to collect 100
[00:18:19] Stanford and you want to collect 100 audio clips uh uh maybe 10 people 10
[00:18:23] audio clips uh uh maybe 10 people 10 Clips per person or maybe a 100
[00:18:25] Clips per person or maybe a 100 different people um I would actually
[00:18:27] different people um I would actually estimate you know if you go to Stanford
[00:18:30] estimate you know if you go to Stanford cafeteria uh how long does it take to
[00:18:32] cafeteria uh how long does it take to get one person right you could probably
[00:18:34] get one person right you could probably get one person every minute or two if
[00:18:36] get one person every minute or two if you go to busy place on on like a
[00:18:38] you go to busy place on on like a Stanford cafeteria so you could probably
[00:18:40] Stanford cafeteria so you could probably get this done in like 100 to 200 minutes
[00:18:42] get this done in like 100 to 200 minutes like two or three hours right it's not
[00:18:44] like two or three hours right it's not that bad so you get this done quite
[00:18:46] that bad so you get this done quite quickly
[00:18:47] quickly um and so and and let's see collect 100
[00:18:50] um and so and and let's see collect 100 audio clips and actually for the for for
[00:18:53] audio clips and actually for the for for the purposes of uh today let's say you
[00:18:56] the purposes of uh today let's say you collect 100 audio clips to use for
[00:19:01] training 25 for your Dev set
[00:19:07] training 25 for your Dev set um and zero for the test set right it's
[00:19:10] um and zero for the test set right it's actually not that uncommon if you're
[00:19:12] actually not that uncommon if you're building a new product to just not have
[00:19:14] building a new product to just not have a test set because your goal is to build
[00:19:16] a test set because your goal is to build something that yach convinces you know
[00:19:18] something that yach convinces you know just early prototyping phases of a
[00:19:20] just early prototyping phases of a project sometimes I don't bother with a
[00:19:22] project sometimes I don't bother with a test set if if you if it goes to
[00:19:24] test set if if you if it goes to publisher paper then of course you need
[00:19:25] publisher paper then of course you need a rigorously collected test set but if
[00:19:27] a rigorously collected test set but if you're just building a product and you
[00:19:28] you're just building a product and you don't need a rigorous evaluation
[00:19:30] don't need a rigorous evaluation sometimes you can just get started
[00:19:32] sometimes you can just get started without dealing with a test set right so
[00:19:34] without dealing with a test set right so it's pretty e to get
[00:19:36] it's pretty e to get started um and
[00:19:44] [Applause]
[00:19:54] then all right so taking that audio clip
[00:19:58] then all right so taking that audio clip from
[00:19:59] from above um one thing you can do to turn
[00:20:03] above um one thing you can do to turn this into supervised learning problem um
[00:20:05] this into supervised learning problem um is to take so you the the the phrase
[00:20:08] is to take so you the the the phrase Robert turn on can be said in less than
[00:20:10] Robert turn on can be said in less than 3 seconds so let's say you take 3
[00:20:12] 3 seconds so let's say you take 3 seconds as the duration of audio right
[00:20:15] seconds as the duration of audio right so what you can do is uh clip out so
[00:20:18] so what you can do is uh clip out so let's say here was when Robert turn on
[00:20:20] let's say here was when Robert turn on was it so what you can do is um right
[00:20:23] was it so what you can do is um right the
[00:20:25] the taret 1 z z um what you can do is then
[00:20:30] taret 1 z z um what you can do is then clip out different audio clips of 3
[00:20:32] clip out different audio clips of 3 seconds so here's one audio clip and you
[00:20:36] seconds so here's one audio clip and you can take that audio clip this is X and
[00:20:39] can take that audio clip this is X and the target label is zero because because
[00:20:43] the target label is zero because because Robert turn on was not said um and you
[00:20:46] Robert turn on was not said um and you can take I know this audio clip a
[00:20:49] can take I know this audio clip a different randomly clipped 3 second clip
[00:20:53] different randomly clipped 3 second clip and that clip also has the target label
[00:20:57] and that clip also has the target label zero um and you know for this one right
[00:21:02] zero um and you know for this one right which is a 3 second clip that come that
[00:21:05] which is a 3 second clip that come that that that ends at the real on the last
[00:21:08] that that ends at the real on the last part of the on sound you would have a
[00:21:10] part of the on sound you would have a Target label of one right so and and uh
[00:21:14] Target label of one right so and and uh when when when you learn about sequence
[00:21:15] when when when you learn about sequence models or RNN you learn a better method
[00:21:17] models or RNN you learn a better method than than this explicit clipping but for
[00:21:19] than than this explicit clipping but for now let's say you take these um audio
[00:21:22] now let's say you take these um audio clips and turn it into so take a
[00:21:25] clips and turn it into so take a 10-second clip and by clipping out Rand
[00:21:28] 10-second clip and by clipping out Rand different Windows you can take your um
[00:21:31] different Windows you can take your um let's say 100 uh uh
[00:21:34] let's say 100 uh uh clips and because for each 10-second
[00:21:37] clips and because for each 10-second clip you can take different Windows you
[00:21:39] clip you can take different Windows you could turn this into let's say uh
[00:21:44] 3,000 training examples right so here I
[00:21:47] 3,000 training examples right so here I took a 10-second clip and and and show
[00:21:50] took a 10-second clip and and and show you know took three three different 3se
[00:21:53] you know took three three different 3se second windows but if you take 30
[00:21:55] second windows but if you take 30 3second windows then each 10-second
[00:21:57] 3second windows then each 10-second audio could becomes 30 examples and now
[00:22:01] audio could becomes 30 examples and now you've turned the problem into a binary
[00:22:02] you've turned the problem into a binary consecration problem where you need to
[00:22:04] consecration problem where you need to train a neuron Network that inputs a 3
[00:22:07] train a neuron Network that inputs a 3 second clip and labels it as either zero
[00:22:10] second clip and labels it as either zero or one right does make sense and so this
[00:22:13] or one right does make sense and so this is an example of uh uh the the the more
[00:22:16] is an example of uh uh the the the more complex uh pipelines you might have if
[00:22:19] complex uh pipelines you might have if you're building a learning algorithm to
[00:22:22] you're building a learning algorithm to take a continuous You Know audio
[00:22:24] take a continuous You Know audio detection problem turn into the bind
[00:22:26] detection problem turn into the bind classification problem which you've
[00:22:27] classification problem which you've learned how to build various neuron
[00:22:29] learned how to build various neuron networks for right and again when you
[00:22:31] networks for right and again when you learn about RNs you learn about other
[00:22:33] learn about RNs you learn about other ways to process sequence data or
[00:22:34] ways to process sequence data or temporal data
[00:22:36] temporal data okay so um go ahead theed right now is
[00:22:42] okay so um go ahead theed right now is that manually La the data oh uh is this
[00:22:46] that manually La the data oh uh is this manly lab yes I I would yeah actually if
[00:22:50] manly lab yes I I would yeah actually if you have 100 examples um it's not that
[00:22:52] you have 100 examples um it's not that hard to just listen to it you know on
[00:22:54] hard to just listen to it you know on your laptop with some audio playing
[00:22:57] your laptop with some audio playing software to figure out when when they
[00:22:59] software to figure out when when they finish saying Robert turn on and then at
[00:23:02] finish saying Robert turn on and then at that moment to put a one in the Target
[00:23:05] that moment to put a one in the Target label right because this is really when
[00:23:06] label right because this is really when you want the lamp to turn on right make
[00:23:10] you want the lamp to turn on right make sense
[00:23:12] sense cool so
[00:23:15] cool so um any other questions actually feel you
[00:23:17] um any other questions actually feel you to ask clarifying questions yeah go
[00:23:18] to ask clarifying questions yeah go ahead um I wonder if this is going to
[00:23:20] ahead um I wonder if this is going to cost the problem that um ones are two
[00:23:23] cost the problem that um ones are two spars oh sure let me get back to that
[00:23:27] spars oh sure let me get back to that sure anything else
[00:23:29] sure anything else all right for a specific reason we only
[00:23:32] all right for a specific reason we only train them with 3
[00:23:34] train them with 3 seconds the voice instead five like some
[00:23:38] seconds the voice instead five like some people's voice oh I see yeah oh why do
[00:23:41] people's voice oh I see yeah oh why do we do 3 seconds and four five seconds
[00:23:42] we do 3 seconds and four five seconds there a yeah is there another hyper PR
[00:23:44] there a yeah is there another hyper PR you can test so I think uh I don't
[00:23:48] you can test so I think uh I don't know uh you you have to say it really
[00:23:51] know uh you you have to say it really slowly to take I know right 3 seconds is
[00:23:55] slowly to take I know right 3 seconds is this
[00:23:56] this right a robit turn on right so again
[00:24:01] right a robit turn on right so again it's it's a design Choice
[00:24:04] it's it's a design Choice yeah
[00:24:05] yeah yeah um all right so so um let's say you
[00:24:10] yeah um all right so so um let's say you do this feed it to supervis learning
[00:24:13] do this feed it to supervis learning algorithm training new network um and
[00:24:16] algorithm training new network um and let's say that when you classify this uh
[00:24:20] let's say that when you classify this uh when you run this algorithm you end up
[00:24:22] when you run this algorithm you end up with uh
[00:24:24] with uh 99.5% accuracy
[00:24:28] right um uh but you find that the
[00:24:33] right um uh but you find that the algorithm has zero
[00:24:43] detections right um and and and and what
[00:24:46] detections right um and and and and what I mean is that whatever audio you give
[00:24:49] I mean is that whatever audio you give it it just outputs zero all the time so
[00:24:52] it it just outputs zero all the time so the hour of them just says Nope I never
[00:24:54] the hour of them just says Nope I never heard the phrase Robert turn on you know
[00:24:56] heard the phrase Robert turn on you know so so so um
[00:24:59] so so so um so uh and so my question to you is you
[00:25:03] so uh and so my question to you is you know and by the way the reason I'm going
[00:25:05] know and by the way the reason I'm going through these scenarios is um I found
[00:25:08] through these scenarios is um I found that uh a good way to gain good
[00:25:11] that uh a good way to gain good intuitions and and to become good at
[00:25:13] intuitions and and to become good at making these decisions is these are the
[00:25:15] making these decisions is these are the decisions that project leader right a
[00:25:17] decisions that project leader right a tech leader or Co needs to make these
[00:25:19] tech leader or Co needs to make these are actually like pretty much exactly
[00:25:20] are actually like pretty much exactly the decisions you need to make and I
[00:25:22] the decisions you need to make and I find that um one of the ways to gain
[00:25:25] find that um one of the ways to gain this type of experience if you you know
[00:25:27] this type of experience if you you know find a job with a good AI team and work
[00:25:29] find a job with a good AI team and work with them for five years right and then
[00:25:31] with them for five years right and then you actually live through this and you
[00:25:32] you actually live through this and you see what they do but instead of needing
[00:25:35] see what they do but instead of needing you to go and spend five years to see 10
[00:25:37] you to go and spend five years to see 10 examples of this I'm trying to step you
[00:25:40] examples of this I'm trying to step you through maybe one example in in in one
[00:25:42] through maybe one example in in in one hour so so instead of uh you know
[00:25:45] hour so so instead of uh you know gaining this experience through work
[00:25:48] gaining this experience through work experience which is great but takes many
[00:25:50] experience which is great but takes many many years many many months uh hoping to
[00:25:54] many years many many months uh hoping to you know let's just put you in the
[00:25:55] you know let's just put you in the position of making these decisions you
[00:25:56] position of making these decisions you can learn from that much faster right um
[00:25:59] can learn from that much faster right um but
[00:26:01] but so uh and and all the examples I'm
[00:26:04] so uh and and all the examples I'm giving are actually completely realistic
[00:26:05] giving are actually completely realistic right there either exactly or very
[00:26:08] right there either exactly or very similar to things I have seen in in
[00:26:10] similar to things I have seen in in actual you know very real projects so
[00:26:13] actual you know very real projects so question is uh your learning album gives
[00:26:15] question is uh your learning album gives this result 95% of aity zero detections
[00:26:18] this result 95% of aity zero detections what do you do let me mention some of
[00:26:21] what do you do let me mention some of some of the answers I really liked um I
[00:26:24] some of the answers I really liked um I think that uh
[00:26:26] think that uh um you know I when I think of building
[00:26:29] um you know I when I think of building learning algorithms uh the process is
[00:26:32] learning algorithms uh the process is often specify a depth set and or test
[00:26:35] often specify a depth set and or test set that measure what you care about and
[00:26:38] set that measure what you care about and then um you don't always have to do it
[00:26:41] then um you don't always have to do it but it's good hygiene it just is it is
[00:26:44] but it's good hygiene it just is it is um uh sharpens Clarity of your thinking
[00:26:47] um uh sharpens Clarity of your thinking right if you have a very clear
[00:26:48] right if you have a very clear specification of problem and I think one
[00:26:51] specification of problem and I think one Insight out of this is that if your
[00:26:52] Insight out of this is that if your death set is really out of whack right
[00:26:54] death set is really out of whack right because it's so unbalanced that accuracy
[00:26:56] because it's so unbalanced that accuracy in your death set doesn't transl relate
[00:26:58] in your death set doesn't transl relate to what you actually care about uh
[00:27:00] to what you actually care about uh because you know presumably it is 99.5%
[00:27:03] because you know presumably it is 99.5% accurate on the dep set as well but this
[00:27:05] accurate on the dep set as well but this performance is terrible so it's doing
[00:27:06] performance is terrible so it's doing great on the depth set on your accuracy
[00:27:08] great on the depth set on your accuracy mat but giving terrible performance so I
[00:27:11] mat but giving terrible performance so I think of it as good hygiene you know
[00:27:13] think of it as good hygiene you know it's kind of good sound practice uh to
[00:27:16] it's kind of good sound practice uh to to just specify make sure you at least
[00:27:18] to just specify make sure you at least have a death set and evaluation metric
[00:27:20] have a death set and evaluation metric that corresponds more closely to what
[00:27:21] that corresponds more closely to what you care about so making the dep set
[00:27:24] you care about so making the dep set more balanc uh equal numbers of positive
[00:27:26] more balanc uh equal numbers of positive and negative would would be good step to
[00:27:28] and negative would would be good step to of that um uh and then I think um uh you
[00:27:33] of that um uh and then I think um uh you could also uh there are a few people
[00:27:36] could also uh there are a few people that talked about um give higher weights
[00:27:39] that talked about um give higher weights to the positive examples right so you
[00:27:42] to the positive examples right so you know uh uh one way to do this is to
[00:27:44] know uh uh one way to do this is to resample your training and your dep sets
[00:27:47] resample your training and your dep sets to make them more proportionate in terms
[00:27:50] to make them more proportionate in terms of maybe closer to balance ratio
[00:27:52] of maybe closer to balance ratio positive negative examples that' be okay
[00:27:54] positive negative examples that' be okay the other way to not do resampling would
[00:27:56] the other way to not do resampling would just give the positive examples a
[00:27:57] just give the positive examples a greater weight right um I would probably
[00:28:00] greater weight right um I would probably resample um another thing you could do
[00:28:03] resample um another thing you could do uh uh uh you know in the in the interest
[00:28:06] uh uh uh you know in the in the interest of um uh speed even if it's not the
[00:28:10] of um uh speed even if it's not the mathematically most most sound thing to
[00:28:12] mathematically most most sound thing to do is to change the target labels to be
[00:28:15] do is to change the target labels to be a bunch of ones after that um uh and
[00:28:19] a bunch of ones after that um uh and this is a hack this is not formally
[00:28:21] this is a hack this is not formally rigorous but if you've implemented the
[00:28:23] rigorous but if you've implemented the rest of this code already this might be
[00:28:25] rest of this code already this might be a reasonable you know a little bit hacky
[00:28:27] a reasonable you know a little bit hacky thing to do but this is this this this
[00:28:29] thing to do but this is this this this might work well enough right I I would I
[00:28:32] might work well enough right I I would I might not I don't know if I would want
[00:28:34] might not I don't know if I would want to try to you write an academic research
[00:28:36] to try to you write an academic research paper with this method maybe you get
[00:28:38] paper with this method maybe you get away with it but this is all thing that
[00:28:40] away with it but this is all thing that I think if you try to publish a paper
[00:28:41] I think if you try to publish a paper with this academic reviewers might raise
[00:28:43] with this academic reviewers might raise their eyebrows and say maybe you know
[00:28:46] their eyebrows and say maybe you know maybe this is okay but I think if you
[00:28:48] maybe this is okay but I think if you want something quick and dirty that just
[00:28:50] want something quick and dirty that just works I think uh uh labeling the ones
[00:28:53] works I think uh uh labeling the ones changing a bunch of labels to be on so
[00:28:55] changing a bunch of labels to be on so that say a clip here
[00:28:59] that say a clip here right uh that ends just a little bit
[00:29:02] right uh that ends just a little bit after Robert turn on the still label one
[00:29:04] after Robert turn on the still label one that would be pretty reasonable but this
[00:29:05] that would be pretty reasonable but this would be saying that um uh for anywhere
[00:29:10] would be saying that um uh for anywhere within maybe a 0.5 second period after
[00:29:12] within maybe a 0.5 second period after Robert turn on finish it's okay to turn
[00:29:14] Robert turn on finish it's okay to turn on the light anytime within that period
[00:29:17] on the light anytime within that period that you kind of want to be turning on
[00:29:19] that you kind of want to be turning on the light turning on the lamp you know
[00:29:21] the light turning on the lamp you know say within half a second right after
[00:29:24] say within half a second right after Robert turn on has has been said right
[00:29:27] Robert turn on has has been said right like and this would be a not this would
[00:29:29] like and this would be a not this would be a way to just get more labels of ones
[00:29:32] be a way to just get more labels of ones in there right that make sense um um
[00:29:37] in there right that make sense um um with like rebalancing your data sets
[00:29:39] with like rebalancing your data sets like the class imbalance um how does
[00:29:42] like the class imbalance um how does that translate to like when you deploy
[00:29:44] that translate to like when you deploy this you're not going to see Robert turn
[00:29:46] this you're not going to see Robert turn on as much right like one out of 1,000
[00:29:49] on as much right like one out of 1,000 might be reflective of what you expect
[00:29:51] might be reflective of what you expect to see yeah this is going yeah right so
[00:29:54] to see yeah this is going yeah right so um I think that uh how to put it um so
[00:29:58] um I think that uh how to put it um so if you actually yes so well I uh this is
[00:30:02] if you actually yes so well I uh this is sort of a depth set and evaluation
[00:30:03] sort of a depth set and evaluation measure kind of question right so uh one
[00:30:06] measure kind of question right so uh one of the couple of the metrics that people
[00:30:08] of the couple of the metrics that people often use uh when actually working on
[00:30:10] often use uh when actually working on this is um when someone says Robert turn
[00:30:13] this is um when someone says Robert turn on what is the chance that actually she
[00:30:15] on what is the chance that actually she wakes up or the lamp turns on and then
[00:30:17] wakes up or the lamp turns on and then the second is if no one is saying
[00:30:19] the second is if no one is saying anything to the lamp you know how often
[00:30:22] anything to the lamp you know how often does it randomly turn on by itself
[00:30:24] does it randomly turn on by itself without you having said anything so
[00:30:25] without you having said anything so those are the two metrics people
[00:30:27] those are the two metrics people actually use and and uh sometimes you
[00:30:30] actually use and and uh sometimes you could also try to combine them a single
[00:30:31] could also try to combine them a single number evaluation metric or something uh
[00:30:34] number evaluation metric or something uh uh but I think that um uh you could then
[00:30:36] uh but I think that um uh you could then Define a data set to measure both of
[00:30:38] Define a data set to measure both of these things and and then and then
[00:30:39] these things and and then and then hopefully find a way to combine them
[00:30:41] hopefully find a way to combine them into single real number which I think
[00:30:43] into single real number which I think yeah I think one of the ways we talked
[00:30:44] yeah I think one of the ways we talked about in the in the videos as well right
[00:30:47] about in the in the videos as well right does that make sense uh yeah but I think
[00:30:49] does that make sense uh yeah but I think I think the question is really um uh
[00:30:52] I think the question is really um uh what is it that satisfies a user need
[00:30:54] what is it that satisfies a user need right yeah and oh and just one one thing
[00:30:57] right yeah and oh and just one one thing about um the straightforward way of
[00:30:59] about um the straightforward way of rebalancing is that if you don't do this
[00:31:02] rebalancing is that if you don't do this then your whole data set just has very
[00:31:04] then your whole data set just has very few positive examples right um and so if
[00:31:08] few positive examples right um and so if you throw away all the negative examples
[00:31:11] you throw away all the negative examples so that you cut down the number of
[00:31:12] so that you cut down the number of negative examples until you have exactly
[00:31:14] negative examples until you have exactly equal numbers of positive and negatives
[00:31:16] equal numbers of positive and negatives you've actually thrown away a lot of
[00:31:18] you've actually thrown away a lot of negative examples does this make sense
[00:31:20] negative examples does this make sense and so one one one problem with the
[00:31:22] and so one one one problem with the straightforward way of rebalancing is
[00:31:23] straightforward way of rebalancing is that you know in your audio clip in your
[00:31:26] that you know in your audio clip in your test 10 second second clip that we
[00:31:28] test 10 second second clip that we collected by running around Stanford um
[00:31:31] collected by running around Stanford um you have one example of robit turn on
[00:31:35] you have one example of robit turn on and so if you want exactly perfectly
[00:31:38] and so if you want exactly perfectly balanced positive and negative it means
[00:31:40] balanced positive and negative it means that you're allowed to only clip out one
[00:31:43] that you're allowed to only clip out one negative example all of this you can say
[00:31:46] negative example all of this you can say that's a negative and that's a positive
[00:31:48] that's a negative and that's a positive and you can't clip out more negative
[00:31:50] and you can't clip out more negative examples from this right so so so if you
[00:31:52] examples from this right so so so if you use a if you insist on the perfect
[00:31:54] use a if you insist on the perfect rebalance you're actually throwing away
[00:31:57] rebalance you're actually throwing away a lot of negative examples that that
[00:31:59] a lot of negative examples that that could be helpful for the learning of
[00:32:00] could be helpful for the learning of them
[00:32:02] them right
[00:32:04] right um
[00:32:08] so all
[00:32:11] so all right
[00:32:13] right so
[00:32:15] so um you know a lot of the workflow of uh
[00:32:18] um you know a lot of the workflow of uh building learning algorithms is um uh
[00:32:22] building learning algorithms is um uh building learning algorithms feels more
[00:32:23] building learning algorithms feels more like debugging right because what
[00:32:25] like debugging right because what happens in a typical machine learning
[00:32:27] happens in a typical machine learning workflow is you implement something and
[00:32:29] workflow is you implement something and it doesn't work so you figure out what
[00:32:30] it doesn't work so you figure out what is the problem so you fix that uh uh
[00:32:33] is the problem so you fix that uh uh like rebalancing or reweighting or
[00:32:35] like rebalancing or reweighting or adding more once and so that fixes the
[00:32:37] adding more once and so that fixes the current problem and then after fixing
[00:32:40] current problem and then after fixing the current problem which which is the
[00:32:42] the current problem which which is the one we just solved say you then come
[00:32:44] one we just solved say you then come across a new problem and you have to
[00:32:45] across a new problem and you have to solve that you fix that problem you come
[00:32:47] solve that you fix that problem you come across another new problem so I find
[00:32:49] across another new problem so I find that uh the workflow of um when I'm work
[00:32:52] that uh the workflow of um when I'm work on a machine learning project it often
[00:32:54] on a machine learning project it often feels more like software debugging than
[00:32:56] feels more like software debugging than software development right because
[00:32:58] software development right because you're often trying to figure out what
[00:33:00] you're often trying to figure out what doesn't work and then trying to fix that
[00:33:01] doesn't work and then trying to fix that and after you fix that problem then
[00:33:03] and after you fix that problem then another Buck surfaces and you squash
[00:33:05] another Buck surfaces and you squash that and you do that and another and you
[00:33:06] that and you do that and another and you kind of keep doing that until the AL
[00:33:08] kind of keep doing that until the AL works so if I keep talking about you
[00:33:11] works so if I keep talking about you know your Al doesn't work what do you do
[00:33:13] know your Al doesn't work what do you do next right that that's kind of the theme
[00:33:14] next right that that's kind of the theme of today's presentation uh but that that
[00:33:17] of today's presentation uh but that that is what the workflow that is what your
[00:33:19] is what the workflow that is what your day-to-day work of developing a learning
[00:33:21] day-to-day work of developing a learning Alum is usually like because it's like
[00:33:23] Alum is usually like because it's like it doesn't work you fix it it still
[00:33:25] it doesn't work you fix it it still doesn't work you fix that it still
[00:33:26] doesn't work you fix that it still doesn't work you fix it and you do that
[00:33:28] doesn't work you fix it and you do that enough times until it works right that
[00:33:30] enough times until it works right that that that is actually what often working
[00:33:31] that that is actually what often working on the learning out Works looks like
[00:33:37] um all right so let's say you fix that
[00:33:40] um all right so let's say you fix that problem um and you conclude uh through
[00:33:45] problem um and you conclude uh through doing error analysis that your algorithm
[00:33:47] doing error analysis that your algorithm is
[00:33:50] overfitting right so you know you you've
[00:33:53] overfitting right so you know you you've added a lot more ones so the data set is
[00:33:55] added a lot more ones so the data set is a little bit more balanced so let's just
[00:33:56] a little bit more balanced so let's just add a bunch of ones like I did on that
[00:33:58] add a bunch of ones like I did on that previous board right let's just add a
[00:34:00] previous board right let's just add a lot of ones here so the data set isn't
[00:34:03] lot of ones here so the data set isn't as
[00:34:04] as unbalanced and
[00:34:08] um let's
[00:34:17] see
[00:34:22] um right okay good
[00:34:26] um right okay good um let's say
[00:34:30] that
[00:34:36] sorry too many pages of notes
[00:34:40] sorry too many pages of notes here okay good so let's say that um you
[00:34:43] here okay good so let's say that um you find that it achieves now 98%
[00:34:47] find that it achieves now 98% accuracy on training and 50% accuracy on
[00:34:52] accuracy on training and 50% accuracy on the dep set right so very large gap
[00:34:55] the dep set right so very large gap between your trading and your um death
[00:34:58] between your trading and your um death set performance and so a clear sign of
[00:35:00] set performance and so a clear sign of overfitting and so I think one of the
[00:35:02] overfitting and so I think one of the earlier questions someone talked about
[00:35:04] earlier questions someone talked about data augmentation uh and so when you
[00:35:06] data augmentation uh and so when you have this clear sign of overfitting um
[00:35:09] have this clear sign of overfitting um this is a good time to consider data
[00:35:12] this is a good time to consider data augmentation right and and so let's say
[00:35:14] augmentation right and and so let's say you go ahead and do data augmentation so
[00:35:16] you go ahead and do data augmentation so for audio this is how you could do data
[00:35:18] for audio this is how you could do data augmentation which is um collect a bunch
[00:35:21] augmentation which is um collect a bunch of background
[00:35:23] of background audio you know so I guess if you're
[00:35:25] audio you know so I guess if you're trying to build a lamp that might go
[00:35:27] trying to build a lamp that might go into people's homes then you could go
[00:35:29] into people's homes then you could go into your friends's homes and uh you
[00:35:31] into your friends's homes and uh you know with their permission record right
[00:35:34] know with their permission record right what the background sound in their home
[00:35:36] what the background sound in their home looks like you know maybe people talk in
[00:35:37] looks like you know maybe people talk in the background maybe with the TV on in
[00:35:39] the background maybe with the TV on in the background what whatever goes on
[00:35:41] the background what whatever goes on people's homes um and then it turns out
[00:35:44] people's homes um and then it turns out that if you take a um say a 1C
[00:35:48] that if you take a um say a 1C clip of Robert turn on of
[00:35:51] clip of Robert turn on of RTO and you add that to a background
[00:35:56] RTO and you add that to a background clip then you can synthesize an audio
[00:35:59] clip then you can synthesize an audio clip of what it sounds like in your
[00:36:00] clip of what it sounds like in your friend's house if someone were to
[00:36:02] friend's house if someone were to suddenly pop up and say rob it turn on
[00:36:05] suddenly pop up and say rob it turn on against the background sound of your
[00:36:06] against the background sound of your friend's house right um and and it turns
[00:36:10] friend's house right um and and it turns out that um uh uh if you want to make
[00:36:14] out that um uh uh if you want to make this system robust so actually for
[00:36:16] this system robust so actually for example I have a I don't know I actually
[00:36:20] example I have a I don't know I actually know someone that lives unfortunately
[00:36:21] know someone that lives unfortunately closely to a train station and so their
[00:36:23] closely to a train station and so their Halls actually has a lot of train
[00:36:25] Halls actually has a lot of train station noise from the cow train uh and
[00:36:27] station noise from the cow train uh and so so what you can do to make your
[00:36:28] so so what you can do to make your system more robust is also uh take you
[00:36:32] system more robust is also uh take you know a clip of say train noise right
[00:36:35] know a clip of say train noise right like cow train noise and if you take
[00:36:37] like cow train noise and if you take that noise and take a in this case let's
[00:36:40] that noise and take a in this case let's say 1 second 1 second or 3 second clip
[00:36:42] say 1 second 1 second or 3 second clip of someone saying rob a turn on and you
[00:36:45] of someone saying rob a turn on and you synthesize that on top of the train in
[00:36:47] synthesize that on top of the train in the background then what you end up with
[00:36:49] the background then what you end up with is a 10-second clip of someone saying
[00:36:52] is a 10-second clip of someone saying rob it turn on against a noisy you know
[00:36:55] rob it turn on against a noisy you know train in the background type of type of
[00:36:57] train in the background type of type of noise
[00:36:58] noise right and so in order to do data
[00:37:01] right and so in order to do data augmentation or data synthesis you can
[00:37:04] augmentation or data synthesis you can take some one second clips of people
[00:37:06] take some one second clips of people saying Robert turn on in the quiet
[00:37:08] saying Robert turn on in the quiet background and then take some one second
[00:37:10] background and then take some one second clip of people saying random words right
[00:37:12] clip of people saying random words right let's say you know Cardinal right say a
[00:37:16] let's say you know Cardinal right say a stford and synthesize this against the
[00:37:18] stford and synthesize this against the train noise background and then you
[00:37:20] train noise background and then you would have in this case you would have
[00:37:22] would have in this case you would have what sounds like tray noise tray noise
[00:37:24] what sounds like tray noise tray noise TR noise TR noise Robert turn on TR
[00:37:26] TR noise TR noise Robert turn on TR noise TR
[00:37:28] noise TR condos right and then uh you could
[00:37:30] condos right and then uh you could generate the labels now as Zero's there
[00:37:35] generate the labels now as Zero's there ones there and then Zer there right
[00:37:38] ones there and then Zer there right because if this is what it actually
[00:37:39] because if this is what it actually sounded like in a in a user's home then
[00:37:42] sounded like in a in a user's home then um you want the lamp to turn on after
[00:37:45] um you want the lamp to turn on after rob a turn on but not after these random
[00:37:47] rob a turn on but not after these random words you can pick different random
[00:37:49] words you can pick different random words
[00:37:51] words right um
[00:37:55] right um so let's see
[00:38:00] right
[00:38:08] so
[00:38:10] so um what I'd like you to do is uh
[00:38:14] um what I'd like you to do is uh evaluate um uh three different possible
[00:38:19] evaluate um uh three different possible ways um to collect noisy data right uh
[00:38:23] ways um to collect noisy data right uh to to to collect this type of background
[00:38:26] to to to collect this type of background data right um and
[00:38:30] data right um and so um what I like you to do for the next
[00:38:33] so um what I like you to do for the next question is let's say you and your team
[00:38:35] question is let's say you and your team you know have uh uh uh brainstormed um
[00:38:39] you know have uh uh uh brainstormed um uh uh brainstormed a few different ways
[00:38:43] uh uh brainstormed a few different ways uh to collect this type of background
[00:38:45] uh to collect this type of background noise data um and let's say you've
[00:38:48] noise data um and let's say you've decided that uh you would like to
[00:38:51] decided that uh you would like to collect uh 10 hours of background noise
[00:38:53] collect uh 10 hours of background noise data right so okay so I'm going to going
[00:38:57] data right so okay so I'm going to going to present to you three options one is
[00:39:04] um you know run around Stanford and
[00:39:07] um you know run around Stanford and place microphones around Stanford or in
[00:39:10] place microphones around Stanford or in your friends homes do this with consent
[00:39:12] your friends homes do this with consent and don't don't you know California
[00:39:14] and don't don't you know California actually you're not supposed to don't
[00:39:15] actually you're not supposed to don't record people about their knowledge and
[00:39:16] record people about their knowledge and consent right uh second is
[00:39:23] uh downloads Clips online
[00:39:28] uh downloads Clips online right uh it it turns out if you go to
[00:39:30] right uh it it turns out if you go to YouTube there are these like 10hour long
[00:39:33] YouTube there are these like 10hour long Clips uh of you know rain noise or cars
[00:39:37] Clips uh of you know rain noise or cars driving around right so you actually uh
[00:39:41] driving around right so you actually uh and again if you do that find something
[00:39:42] and again if you do that find something that's Creative Commons and of
[00:39:44] that's Creative Commons and of appropriately license right um another
[00:39:47] appropriately license right um another thing you could do is uh use a
[00:39:49] thing you could do is uh use a Mechanical
[00:39:56] Turk and mechanical
[00:39:59] Turk and mechanical tur we can have people all all around
[00:40:02] tur we can have people all all around the world um be paid you know modest
[00:40:05] the world um be paid you know modest amounts of money to submit audio clips
[00:40:08] amounts of money to submit audio clips right so for the next exercise what I
[00:40:10] right so for the next exercise what I want you to do because um and I want you
[00:40:12] want you to do because um and I want you to have this exercise of of of this
[00:40:14] to have this exercise of of of this discipline which is what I want you to
[00:40:16] discipline which is what I want you to do is um I want you to estimate let's
[00:40:19] do is um I want you to estimate let's see what time is it now okay it's 12:30
[00:40:23] see what time is it now okay it's 12:30 p.m. right now what I want you to do is
[00:40:26] p.m. right now what I want you to do is uh write down three numbers in the next
[00:40:29] uh write down three numbers in the next exercise to estimate if you were to do
[00:40:33] exercise to estimate if you were to do this you know let's say you were to go
[00:40:35] this you know let's say you were to go do this right now right by what time
[00:40:38] do this right now right by what time will you have finished if you were to do
[00:40:41] will you have finished if you were to do option one what time would you finish
[00:40:43] option one what time would you finish you were to do option two what time
[00:40:45] you were to do option two what time would you finish you were to do option
[00:40:46] would you finish you were to do option three if your goal is to collect 10
[00:40:48] three if your goal is to collect 10 hours of data through one of these
[00:40:50] hours of data through one of these mechanisms does that make sense so it's
[00:40:52] mechanisms does that make sense so it's 12:30 p.m. now so what I like you to do
[00:40:55] 12:30 p.m. now so what I like you to do is just write down three numbers
[00:40:58] is just write down three numbers um first number is what time is it what
[00:41:01] um first number is what time is it what time will it be by the time you
[00:41:04] time will it be by the time you collected 10 hours of data you know from
[00:41:07] collected 10 hours of data you know from around stand what time will it be right
[00:41:09] around stand what time will it be right and and if you could do this in so so if
[00:41:12] and and if you could do this in so so if if you think you do it by tonight then
[00:41:14] if you think you do it by tonight then write 900 p.m. if you think it'll do if
[00:41:16] write 900 p.m. if you think it'll do if you think it'll take you one week then
[00:41:17] you think it'll take you one week then write the date one week from now right
[00:41:19] write the date one week from now right whatever it is uh but just write down
[00:41:21] whatever it is uh but just write down three numbers of these three activities
[00:41:23] three numbers of these three activities okay let's want do this one relatively
[00:41:26] okay let's want do this one relatively quickly can people do this in like a
[00:41:28] quickly can people do this in like a maybe a minute and a
[00:41:29] maybe a minute and a half all right cool this is interesting
[00:41:39] um yeah what do people think actually
[00:41:41] um yeah what do people think actually this surprisingly large variability I'll
[00:41:44] this surprisingly large variability I'll mention one thing that um surprised me
[00:41:48] mention one thing that um surprised me um I'll give you my own assessment I
[00:41:50] um I'll give you my own assessment I think that
[00:41:52] think that uh you know when I'm leading startup
[00:41:55] uh you know when I'm leading startup teams we tend to be very Scrappy right
[00:41:57] teams we tend to be very Scrappy right and so I think that um if it goes to
[00:41:59] and so I think that um if it goes to collect 10 hours of data if you have
[00:42:01] collect 10 hours of data if you have three friends with laptop you can
[00:42:03] three friends with laptop you can collect three hours of data per hour
[00:42:05] collect three hours of data per hour because you got three recordings going
[00:42:06] because you got three recordings going in parallel so if I were doing this with
[00:42:09] in parallel so if I were doing this with say two other friends you know I bet I
[00:42:11] say two other friends you know I bet I bet we could get this done by tonight
[00:42:14] bet we could get this done by tonight right uh uh because if you need nine
[00:42:15] right uh uh because if you need nine hours of data that's each person needs
[00:42:18] hours of data that's each person needs to collect three hours of data and you
[00:42:20] to collect three hours of data and you run around Stanford and C the
[00:42:21] run around Stanford and C the microphone's running I bet I bet I could
[00:42:23] microphone's running I bet I bet I could get this done by 6 p.m. right maybe
[00:42:26] get this done by 6 p.m. right maybe maybe even earlier I don't know
[00:42:28] maybe even earlier I don't know um download Clips online uh is actually
[00:42:32] um download Clips online uh is actually I don't know it's actually an
[00:42:33] I don't know it's actually an interesting one maybe about the same
[00:42:34] interesting one maybe about the same time um it turns out one tricky thing
[00:42:37] time um it turns out one tricky thing about downloading Clips online is that
[00:42:40] about downloading Clips online is that um uh I think a lot of the you there are
[00:42:43] um uh I think a lot of the you there are people that um have trouble sleeping at
[00:42:45] people that um have trouble sleeping at night so they listen to Highway noise or
[00:42:47] night so they listen to Highway noise or whatever right and so there are these
[00:42:50] whatever right and so there are these you know 20 hours of Highway Clips
[00:42:52] you know 20 hours of Highway Clips Highway noise on YouTube that you can
[00:42:54] Highway noise on YouTube that you can find but I I don't know how those were
[00:42:57] find but I I don't know how those were generated and I suspect a lot of them
[00:42:59] generated and I suspect a lot of them Loop right meaning it's the same one
[00:43:01] Loop right meaning it's the same one hour play over and over so I actually
[00:43:04] hour play over and over so I actually think it's harder than than than one
[00:43:06] think it's harder than than than one might guess to get 10 hours of um
[00:43:09] might guess to get 10 hours of um non-repetitive data and it's one of
[00:43:11] non-repetitive data and it's one of those things you know if I take an R of
[00:43:13] those things you know if I take an R of high highway sound and loop it you can't
[00:43:16] high highway sound and loop it you can't tell the difference because all highway
[00:43:17] tell the difference because all highway sound sounds the same I just can't tell
[00:43:20] sound sounds the same I just can't tell one minute of Highway sound from another
[00:43:21] one minute of Highway sound from another one but um if you have one hour of
[00:43:23] one but um if you have one hour of Highway sound looped 10 times the
[00:43:26] Highway sound looped 10 times the learning Alm wasy perform much less well
[00:43:28] learning Alm wasy perform much less well than if you have 10 hours of fresh
[00:43:30] than if you have 10 hours of fresh Highway sound so this I would actually
[00:43:32] Highway sound so this I would actually have a harder time doing I think I
[00:43:34] have a harder time doing I think I probably I I I would Pro if I were doing
[00:43:36] probably I I I would Pro if I were doing this I because of these problems I would
[00:43:39] this I because of these problems I would probably budget until sometime
[00:43:41] probably budget until sometime tomorrow right may maybe maybe 9:00 p.m.
[00:43:44] tomorrow right may maybe maybe 9:00 p.m. or something maybe that's doable I'm not
[00:43:46] or something maybe that's doable I'm not sure um the one surprise to me was some
[00:43:49] sure um the one surprise to me was some people thought they could do this by
[00:43:50] people thought they could do this by tonight uh I again I've used Amazon
[00:43:52] tonight uh I again I've used Amazon Mechanical it's actually a huge process
[00:43:55] Mechanical it's actually a huge process to set up Amazon Mechanical get people
[00:43:57] to set up Amazon Mechanical get people on board um and especially to get them
[00:43:59] on board um and especially to get them microphone uh uh I don't know if you
[00:44:01] microphone uh uh I don't know if you implement something on flash they can
[00:44:02] implement something on flash they can speak in their web browser or and and
[00:44:04] speak in their web browser or and and Flash isn't be supportive it's actually
[00:44:07] Flash isn't be supportive it's actually so it's actually not that easy to get a
[00:44:09] so it's actually not that easy to get a lot of turkers to do this and the global
[00:44:11] lot of turkers to do this and the global supply of turkers is also unlimited so I
[00:44:14] supply of turkers is also unlimited so I would if I were doing this I would
[00:44:17] would if I were doing this I would probably I don't know maybe a week or
[00:44:19] probably I don't know maybe a week or something right hard to say I'm not sure
[00:44:21] something right hard to say I'm not sure um but so the specific opinion isn't
[00:44:24] um but so the specific opinion isn't that important but I want you to go
[00:44:26] that important but I want you to go through this excise because this is how
[00:44:29] through this excise because this is how um efficient startup team should you
[00:44:31] um efficient startup team should you know brainstorm a list of things and
[00:44:33] know brainstorm a list of things and then you all figure out how long you
[00:44:35] then you all figure out how long you think it'll take to do these things and
[00:44:37] think it'll take to do these things and I think uh we can have a debate about
[00:44:39] I think uh we can have a debate about how high quality the data is I think you
[00:44:41] how high quality the data is I think you can get very high quality data from this
[00:44:43] can get very high quality data from this and from this uh I I I just don't trust
[00:44:47] and from this uh I I I just don't trust a lot of those online audio sources uh
[00:44:49] a lot of those online audio sources uh but if this is really fast and you can
[00:44:51] but if this is really fast and you can get pretty high quality data I would
[00:44:53] get pretty high quality data I would probably do this to collect the
[00:44:54] probably do this to collect the background sound to get going right but
[00:44:57] background sound to get going right but I think that part of the workflow I see
[00:44:59] I think that part of the workflow I see of you know fast moving teams is um
[00:45:03] of you know fast moving teams is um pretty much exactly what you did which
[00:45:05] pretty much exactly what you did which is why have that exercise of
[00:45:06] is why have that exercise of brainstorming the list of options and
[00:45:08] brainstorming the list of options and then really estimating oh what time can
[00:45:10] then really estimating oh what time can we get this done and then use that to
[00:45:12] we get this done and then use that to pick an option right um and then I want
[00:45:16] pick an option right um and then I want to just mention one last thing um which
[00:45:20] to just mention one last thing um which is
[00:45:23] that these differences matter right um
[00:45:28] that these differences matter right um you know I've actually I've built a lot
[00:45:30] you know I've actually I've built a lot of speech system bu a lot of machine
[00:45:32] of speech system bu a lot of machine learning systems but um oh and and I
[00:45:34] learning systems but um oh and and I think by the way if you do everything we
[00:45:36] think by the way if you do everything we just described and you see this later in
[00:45:38] just described and you see this later in a problem Set uh you can actually with
[00:45:41] a problem Set uh you can actually with this set of ideas pretty much this set
[00:45:42] this set of ideas pretty much this set of ideas that we just went through today
[00:45:44] of ideas that we just went through today you can actually put a build build a
[00:45:46] you can actually put a build build a pretty decent trigger Weare detection
[00:45:47] pretty decent trigger Weare detection system or wake word trigger detection
[00:45:49] system or wake word trigger detection system in fact we ask to do pretty much
[00:45:51] system in fact we ask to do pretty much this in the later homework exercise but
[00:45:53] this in the later homework exercise but now you know when you get to that
[00:45:55] now you know when you get to that homework exercise when you do RNN of you
[00:45:58] homework exercise when you do RNN of you know how you could come up with this
[00:45:59] know how you could come up with this sort of process yourself if if you
[00:46:01] sort of process yourself if if you didn't already know how to make these
[00:46:03] didn't already know how to make these types of choices yeah just one question
[00:46:06] types of choices yeah just one question at what time of my research do I have
[00:46:08] at what time of my research do I have like to think about like which SK of how
[00:46:12] like to think about like which SK of how my micro will affect my results for at
[00:46:15] my micro will affect my results for at the beginning I could think like it's
[00:46:17] the beginning I could think like it's not important like my micro phone on the
[00:46:20] not important like my micro phone on the light is the same as the one that is
[00:46:21] light is the same as the one that is used when I run around St or when I
[00:46:25] used when I run around St or when I download C but it might mess a lot my
[00:46:27] download C but it might mess a lot my data so that's what point do I have to
[00:46:30] data so that's what point do I have to think about it yeah so my advice so what
[00:46:32] think about it yeah so my advice so what does your microphone affect your results
[00:46:34] does your microphone affect your results right my my advice would be to uh get
[00:46:37] right my my advice would be to uh get something going quick and dirty and then
[00:46:40] something going quick and dirty and then uh develop a depth set right with the
[00:46:42] uh develop a depth set right with the actual types of data you think you get
[00:46:44] actual types of data you think you get on your real microphone and then see if
[00:46:46] on your real microphone and then see if it is a problem and it may be different
[00:46:48] it is a problem and it may be different microphones do have different
[00:46:50] microphones do have different characteristics and if it is a problem
[00:46:52] characteristics and if it is a problem then go back and think about how you
[00:46:53] then go back and think about how you collect data that's more representative
[00:46:55] collect data that's more representative of how you test okay I want to mention
[00:46:58] of how you test okay I want to mention one more quick thing do I handle clost
[00:46:59] one more quick thing do I handle clost surveys I want to do something real
[00:47:01] surveys I want to do something real quick which is um I want to tell you why
[00:47:03] quick which is um I want to tell you why these things really matter which is um
[00:47:05] these things really matter which is um if this is a performance right let's say
[00:47:09] if this is a performance right let's say actually let's say error and um this is
[00:47:12] actually let's say error and um this is time right and if this is today and
[00:47:16] time right and if this is today and you're the CE of this s remember that's
[00:47:18] you're the CE of this s remember that's that's what we're doing in this lesson
[00:47:20] that's what we're doing in this lesson and this is six months from now and this
[00:47:22] and this is six months from now and this is 12 months from
[00:47:23] is 12 months from now great um you know maybe of a
[00:47:27] now great um you know maybe of a competitor actually maybe maybe I don't
[00:47:30] competitor actually maybe maybe I don't know maybe because we talked about this
[00:47:32] know maybe because we talked about this so much in this class maybe two of you
[00:47:34] so much in this class maybe two of you in this going to build this thought up
[00:47:36] in this going to build this thought up but but a competitor um but over time
[00:47:41] but but a competitor um but over time most machine learning
[00:47:42] most machine learning [Music]
[00:47:44] [Music] teams you know the error actually goes
[00:47:46] teams you know the error actually goes down over time as you work on problems
[00:47:48] down over time as you work on problems right I mean this is what I see in tons
[00:47:49] right I mean this is what I see in tons of practical projects you know we work
[00:47:51] of practical projects you know we work on the project improve the system and
[00:47:53] on the project improve the system and the error actually goes down over time
[00:47:55] the error actually goes down over time as you work on this over the next 12
[00:47:57] as you work on this over the next 12 months say right if you're really see of
[00:47:59] months say right if you're really see of a startup doing this and it turns out
[00:48:01] a startup doing this and it turns out that is the startups have the discipline
[00:48:03] that is the startups have the discipline to constantly be the most efficient um
[00:48:06] to constantly be the most efficient um don't do something that takes you two
[00:48:08] don't do something that takes you two days if you can get a similar result in
[00:48:09] days if you can get a similar result in one day the difference is not that
[00:48:11] one day the difference is not that you're one day slower the difference is
[00:48:14] you're one day slower the difference is that you're 2x faster right and then and
[00:48:16] that you're 2x faster right and then and having that mindset if we can take this
[00:48:18] having that mindset if we can take this whole chart and compress it on the
[00:48:20] whole chart and compress it on the horizontal axis um then
[00:48:25] horizontal axis um then you want to be the startup that you know
[00:48:28] you want to be the startup that you know makes the same amount of PRS in 6 months
[00:48:29] makes the same amount of PRS in 6 months inste of 12 months right because uh if
[00:48:33] inste of 12 months right because uh if you're able to do this then your startup
[00:48:35] you're able to do this then your startup will actually perform much better in the
[00:48:37] will actually perform much better in the marketplace assuming you know accuracy
[00:48:39] marketplace assuming you know accuracy is important which it seems to be for
[00:48:40] is important which it seems to be for Wake word and so don't think of this as
[00:48:43] Wake word and so don't think of this as saving you a day here and there think of
[00:48:45] saving you a day here and there think of this as making your team twice as fast
[00:48:47] this as making your team twice as fast and that's the difference between this
[00:48:48] and that's the difference between this level of performance and that level of
[00:48:50] level of performance and that level of performance so that's why when I'm you
[00:48:52] performance so that's why when I'm you know building teams and executing these
[00:48:54] know building teams and executing these projects I tend to be pretty obsessive
[00:48:56] projects I tend to be pretty obsessive about about uh making sure we're very
[00:48:58] about about uh making sure we're very efficient in exploring the options and
[00:49:00] efficient in exploring the options and don't wait till tomorrow to collect data
[00:49:03] don't wait till tomorrow to collect data of dubious quality when you have a
[00:49:04] of dubious quality when you have a better idea of collecting data by today
[00:49:07] better idea of collecting data by today because the difference is not that you
[00:49:08] because the difference is not that you wasted 12 hours the difference is you
[00:49:10] wasted 12 hours the difference is you are twice as slow as a company right so
[00:49:12] are twice as slow as a company right so I think uh so hopefully through this
[00:49:14] I think uh so hopefully through this example and your ongoing experiences
[00:49:16] example and your ongoing experiences throughout this qualter can help you
[00:49:18] throughout this qualter can help you continue to get better at this right um
[00:49:22] continue to get better at this right um last thing we want to do was uh we're
[00:49:24] last thing we want to do was uh we're about halfway through the course go
[00:49:25] about halfway through the course go ahead um we want to hand out a survey uh
[00:49:28] ahead um we want to hand out a survey uh an anonymous survey uh to get some
[00:49:31] an anonymous survey uh to get some feedback from you about this class and
[00:49:33] feedback from you about this class and whenever we get these surveys uh we end
[00:49:36] whenever we get these surveys uh we end up uh uh thanks to previous generations
[00:49:39] up uh uh thanks to previous generations of students feedback we've already been
[00:49:40] of students feedback we've already been gradually making class better so I think
[00:49:43] gradually making class better so I think uh Ken and I actually read all of these
[00:49:45] uh Ken and I actually read all of these questions ourselves and try to find ways
[00:49:47] questions ourselves and try to find ways to take your feedback to improve the
[00:49:49] to take your feedback to improve the class so uh if you can take you know
[00:49:51] class so uh if you can take you know five minutes uh um f the survey and you
[00:49:54] five minutes uh um f the survey and you can hand it in just drop it off
[00:49:55] can hand it in just drop it off anonymously up here in front uh be very
[00:49:58] anonymously up here in front uh be very grateful for your suggestions okay so
[00:50:04] um I think if you haven't entered your
[00:50:08] um I think if you haven't entered your ID yet uh you could still do so but uh
[00:50:11] ID yet uh you could still do so but uh that's it for today so please follow the
[00:50:12] that's it for today so please follow the survey and the anonymously just drop off
[00:50:15] survey and the anonymously just drop off back and front then we'll wrap up okay
[00:50:17] back and front then we'll wrap up okay thank you


================================================================================
LECTURE 007
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 7 - Interpretability of Neural Network

Source: https://www.youtube.com/watch?v=gCJCgQW_LKc

---

Transcript

[00:00:04] hi everyone welcome to lecture number
[00:00:07] hi everyone welcome to lecture number seven so up to now I believe can you
[00:00:12] seven so up to now I believe can you hear me in the back is it easy okay so
[00:00:15] hear me in the back is it easy okay so in the last set of module that you've
[00:00:18] in the last set of module that you've seen you've learnt about convolutional
[00:00:20] seen you've learnt about convolutional neural networks and how they can be
[00:00:21] neural networks and how they can be applied to imaging notably
[00:00:24] applied to imaging notably you've played with different types of
[00:00:26] you've played with different types of layers including pooling max pooling
[00:00:29] layers including pooling max pooling average pooling and convolutional layers
[00:00:32] average pooling and convolutional layers you've also seen some classification
[00:00:34] you've also seen some classification with the most classic algorithms all the
[00:00:39] with the most classic algorithms all the way up to inception and resonance and
[00:00:42] way up to inception and resonance and then you jumped into advanced
[00:00:44] then you jumped into advanced application like object detection with
[00:00:46] application like object detection with Yolo and the faster scan and faster our
[00:00:48] Yolo and the faster scan and faster our CNN series with an optional video and
[00:00:51] CNN series with an optional video and finally face recognition and your site
[00:00:54] finally face recognition and your site transfer that we talked a little bit
[00:00:55] transfer that we talked a little bit about in the past lectures so today
[00:00:57] about in the past lectures so today we're going to build on top of
[00:00:59] we're going to build on top of everything you've seen in this set of
[00:01:00] everything you've seen in this set of modules to try to delve into the neural
[00:01:02] modules to try to delve into the neural networks and interpret them because you
[00:01:05] networks and interpret them because you you noticed after seeing the set of
[00:01:08] you noticed after seeing the set of modules up to now that a lot of
[00:01:10] modules up to now that a lot of improvements of the neural networks are
[00:01:13] improvements of the neural networks are based on trial and error so we try
[00:01:16] based on trial and error so we try something we do hyper parameter search
[00:01:19] something we do hyper parameter search sometimes the model improves sometimes
[00:01:21] sometimes the model improves sometimes it doesn't we use a validation set to
[00:01:23] it doesn't we use a validation set to find the right set of methods that would
[00:01:26] find the right set of methods that would make our model improve it's not
[00:01:28] make our model improve it's not satisfactory from a scientific
[00:01:29] satisfactory from a scientific standpoint so people are also searching
[00:01:31] standpoint so people are also searching how can we find an effective way to
[00:01:35] how can we find an effective way to improve our neural networks not only
[00:01:37] improve our neural networks not only with trial and error but with theory
[00:01:39] with trial and error but with theory that goes into the network and
[00:01:41] that goes into the network and visualizations so today we will focus on
[00:01:43] visualizations so today we will focus on that we first will see three methods
[00:01:47] that we first will see three methods saliency Maps occlusion sensitivity and
[00:01:50] saliency Maps occlusion sensitivity and class activation maps which are used to
[00:01:53] class activation maps which are used to kind of understand what was the decision
[00:01:56] kind of understand what was the decision process of the network given this output
[00:01:59] process of the network given this output how can we map back the output decision
[00:02:02] how can we map back the output decision on the input space to see which part of
[00:02:05] on the input space to see which part of the inputs were discriminative for this
[00:02:07] the inputs were discriminative for this output and later on we will delve even
[00:02:10] output and later on we will delve even more in details into the network by
[00:02:12] more in details into the network by looking at intermediate layers what
[00:02:13] looking at intermediate layers what happens at an activation level at a
[00:02:15] happens at an activation level at a layer level and at the net
[00:02:17] layer level and at the net clever with another set of methods
[00:02:19] clever with another set of methods gradient ascent class model
[00:02:21] gradient ascent class model visualization data set search and
[00:02:23] visualization data set search and deconvolution we will spend some times
[00:02:26] deconvolution we will spend some times on the deconvolution because it's it's a
[00:02:29] on the deconvolution because it's it's a cool it's a cool type of mathematical
[00:02:31] cool it's a cool type of mathematical operation to know and it will give you
[00:02:34] operation to know and it will give you more intuition on how the convolution
[00:02:35] more intuition on how the convolution works from a mathematical perspective if
[00:02:39] works from a mathematical perspective if we have time we go over a fun
[00:02:40] we have time we go over a fun application called deep dream which is
[00:02:43] application called deep dream which is super cool visuals for some of you who
[00:02:45] super cool visuals for some of you who know it okay let's go menti code is on
[00:02:49] know it okay let's go menti code is on the board if you guys need to sign up
[00:02:50] the board if you guys need to sign up so as usual we go over some contextual
[00:02:55] so as usual we go over some contextual information and in small case studies so
[00:02:58] information and in small case studies so don't hesitate to participate so you've
[00:03:00] don't hesitate to participate so you've built an animal classifier for a pet
[00:03:02] built an animal classifier for a pet shop and you gave it to them it's it's
[00:03:05] shop and you gave it to them it's it's super good it's been trained on imagenet
[00:03:08] super good it's been trained on imagenet plus some other data and what what is a
[00:03:11] plus some other data and what what is a little worrying is that the pet shop is
[00:03:13] little worrying is that the pet shop is a little reluctant to use your network
[00:03:15] a little reluctant to use your network because they don't understand the
[00:03:17] because they don't understand the decision process of the model so how can
[00:03:20] decision process of the model so how can you quickly show that the model is
[00:03:23] you quickly show that the model is actually looking at a specific animal
[00:03:25] actually looking at a specific animal let's say your cut if I give it an input
[00:03:27] let's say your cut if I give it an input that is a cat we've seen that together
[00:03:31] that is a cat we've seen that together one time already remembers so I go
[00:03:34] one time already remembers so I go quickly you have a network here's a dog
[00:03:37] quickly you have a network here's a dog given as an input to a CNN the CNN
[00:03:40] given as an input to a CNN the CNN assuming the constraint is that there is
[00:03:42] assuming the constraint is that there is one animal per image was trained with
[00:03:44] one animal per image was trained with the softmax output layer and we get a
[00:03:46] the softmax output layer and we get a probability distribution over all
[00:03:47] probability distribution over all animals iguana dog car cats and in crap
[00:03:52] animals iguana dog car cats and in crap and what we want is to take the
[00:03:55] and what we want is to take the derivative of the score of dog and back
[00:03:57] derivative of the score of dog and back propagated to the input to know which
[00:03:58] propagated to the input to know which parts of the inputs were discriminative
[00:04:01] parts of the inputs were discriminative for this score of dog does that make
[00:04:04] for this score of dog does that make sense everybody remembers this and so
[00:04:07] sense everybody remembers this and so the interesting part is that this value
[00:04:09] the interesting part is that this value is the same shape as X so it's the size
[00:04:12] is the same shape as X so it's the size of the input it's a matrix of numbers if
[00:04:14] of the input it's a matrix of numbers if the numbers are large in absolute value
[00:04:16] the numbers are large in absolute value it means the pixels corresponding to
[00:04:18] it means the pixels corresponding to these locations had an impact on the
[00:04:20] these locations had an impact on the score of duck okay what do you think the
[00:04:24] score of duck okay what do you think the score of dog is easy the output
[00:04:26] score of dog is easy the output probability R now what what do I mean by
[00:04:30] probability R now what what do I mean by s of
[00:04:40] yep score on the dog it's a score of the
[00:04:43] yep score on the dog it's a score of the dog yeah but easy point 80 pipe that's
[00:04:47] dog yeah but easy point 80 pipe that's what I need yes it's the the score that
[00:04:56] what I need yes it's the the score that is pre soft max it's the score that
[00:04:58] is pre soft max it's the score that comes before the soft max so as you
[00:05:00] comes before the soft max so as you reminder here's a a soft max layer and
[00:05:03] reminder here's a a soft max layer and this is how it could be presented so you
[00:05:05] this is how it could be presented so you get us a vector that is a set of scores
[00:05:07] get us a vector that is a set of scores that are not necessarily probabilities
[00:05:08] that are not necessarily probabilities they're just scores between minus
[00:05:10] they're just scores between minus infinity and plus infinity you give them
[00:05:13] infinity and plus infinity you give them to the soft max and the soft max what is
[00:05:15] to the soft max and the soft max what is going to do is that is going to output a
[00:05:16] going to do is that is going to output a vector where the sum of all the
[00:05:19] vector where the sum of all the probabilities in this vector are going
[00:05:21] probabilities in this vector are going to sum up to one okay and so the issue
[00:05:24] to sum up to one okay and so the issue is if instead of using the derivative of
[00:05:27] is if instead of using the derivative of what we called Y hat last time we use
[00:05:30] what we called Y hat last time we use the score of dog we will get a better
[00:05:32] the score of dog we will get a better representation here the reason is in
[00:05:35] representation here the reason is in order to maximize this number score of
[00:05:38] order to maximize this number score of dog divided by the sum of the score of
[00:05:40] dog divided by the sum of the score of all animals or like maybe I should write
[00:05:44] all animals or like maybe I should write exponential of score of dog divided by
[00:05:46] exponential of score of dog divided by sum of exponential of the score of all
[00:05:48] sum of exponential of the score of all animals one way is to minimize the
[00:05:53] animals one way is to minimize the scores of all the other animals rather
[00:05:56] scores of all the other animals rather than maximizing the score of dog so you
[00:05:59] than maximizing the score of dog so you see so maybe moving a certain pixel we
[00:06:00] see so maybe moving a certain pixel we minimize the score of fish and so this
[00:06:03] minimize the score of fish and so this pixel will have a high influence on Y
[00:06:06] pixel will have a high influence on Y hats the general output of the network
[00:06:08] hats the general output of the network but it actually doesn't have an
[00:06:09] but it actually doesn't have an influence on the score of dog one layer
[00:06:11] influence on the score of dog one layer before does it make sense so that's why
[00:06:15] before does it make sense so that's why we would use the scores pre soft max
[00:06:19] we would use the scores pre soft max instead of using the scores post soft
[00:06:21] instead of using the scores post soft max that are the probabilities okay and
[00:06:24] max that are the probabilities okay and what's fun is here you cannot see
[00:06:26] what's fun is here you cannot see there's the sides are online if you want
[00:06:28] there's the sides are online if you want to if you want to look at it on your
[00:06:29] to if you want to look at it on your computers but you have some of the
[00:06:31] computers but you have some of the pixels that are roughly the same
[00:06:33] pixels that are roughly the same positions as the dog is on the input
[00:06:35] positions as the dog is on the input image that are stronger so we see some
[00:06:39] image that are stronger so we see some white pixels here and this can be used
[00:06:42] white pixels here and this can be used to segment the dog probably
[00:06:44] to segment the dog probably so you could use a simple trash holding
[00:06:46] so you could use a simple trash holding to find where the dog was based on this
[00:06:49] to find where the dog was based on this pixel pixel there is the pixel score map
[00:06:55] pixel pixel there is the pixel score map doesn't work too well in practice so we
[00:06:58] doesn't work too well in practice so we have better methods to do segmentation
[00:06:59] have better methods to do segmentation but this can be done as well so this is
[00:07:03] but this can be done as well so this is what is called salience see maps and
[00:07:05] what is called salience see maps and it's a common technique to quickly
[00:07:07] it's a common technique to quickly visualize what the network is looking at
[00:07:10] visualize what the network is looking at in practice we will use other methods so
[00:07:14] in practice we will use other methods so here's another contextual story now
[00:07:18] here's another contextual story now you've built the animal classifier
[00:07:19] you've built the animal classifier they're still little scared but you want
[00:07:21] they're still little scared but you want to prove that the model is actually
[00:07:22] to prove that the model is actually looking at the input image at the right
[00:07:24] looking at the input image at the right position you don't need to be quick but
[00:07:27] position you don't need to be quick but you have to be very precise ya know the
[00:07:36] you have to be very precise ya know the saliency map is literally this thing
[00:07:38] saliency map is literally this thing here is the values of the derivative so
[00:07:47] here is the values of the derivative so you you you take the score of dog you
[00:07:48] you you you take the score of dog you back propagate the gradient all the way
[00:07:50] back propagate the gradient all the way to the input it gives you a matrix that
[00:07:52] to the input it gives you a matrix that is exactly the same size as X and you
[00:07:54] is exactly the same size as X and you use you use like a specific color scheme
[00:07:57] use you use like a specific color scheme to see which pixels are the strongest
[00:07:58] to see which pixels are the strongest thank you okay so here we have our CNN
[00:08:04] thank you okay so here we have our CNN the dog is four propagated and you get a
[00:08:07] the dog is four propagated and you get a score of probability score for the dog
[00:08:10] score of probability score for the dog now you want a method that is more
[00:08:12] now you want a method that is more precise than the previous one but not
[00:08:14] precise than the previous one but not necessarily too fast and this one we've
[00:08:16] necessarily too fast and this one we've talked about it a little bit it's
[00:08:18] talked about it a little bit it's occlusion sensitivity so the idea here
[00:08:20] occlusion sensitivity so the idea here is to put a gray square on the dog here
[00:08:23] is to put a gray square on the dog here and we propagate this image with the
[00:08:27] and we propagate this image with the gray square at this position through the
[00:08:29] gray square at this position through the CNN what we get is another probability
[00:08:31] CNN what we get is another probability distribution that is probably similar to
[00:08:33] distribution that is probably similar to the one we had before because the gray
[00:08:35] the one we had before because the gray square doesn't seem to impact too much
[00:08:36] square doesn't seem to impact too much image it's at least from a human
[00:08:39] image it's at least from a human perspective we still see a dog right so
[00:08:42] perspective we still see a dog right so the score of dog might be high 83%
[00:08:44] the score of dog might be high 83% probably what we can say is that we can
[00:08:47] probably what we can say is that we can build a probability map corresponding to
[00:08:50] build a probability map corresponding to the class dog and ha and we write down
[00:08:53] the class dog and ha and we write down on this map how confident is the network
[00:08:56] on this map how confident is the network if the gray square is that
[00:08:58] if the gray square is that testing location so for our first
[00:09:00] testing location so for our first location it seems that the network is
[00:09:02] location it seems that the network is very confident so let's put a red square
[00:09:04] very confident so let's put a red square here now I'm going to move the gray
[00:09:06] here now I'm going to move the gray square a little bit I'm shifting it just
[00:09:08] square a little bit I'm shifting it just as we do for convolution and I'm going
[00:09:10] as we do for convolution and I'm going to send again this new image in the
[00:09:13] to send again this new image in the network it's going to give me a new
[00:09:16] network it's going to give me a new probability distribution output and the
[00:09:18] probability distribution output and the score of dog might change so looking at
[00:09:21] score of dog might change so looking at this score of dog I'm going to say ok
[00:09:23] this score of dog I'm going to say ok the network is still very confident that
[00:09:25] the network is still very confident that there is a dog here and I continue I
[00:09:27] there is a dog here and I continue I shift it again here same networks still
[00:09:29] shift it again here same networks still very confident that there is a dog
[00:09:31] very confident that there is a dog now I shift the square vertically down
[00:09:35] now I shift the square vertically down and I see that partial that the the face
[00:09:39] and I see that partial that the the face of the dog is partially occluded
[00:09:40] of the dog is partially occluded probability of dog will probably go down
[00:09:43] probability of dog will probably go down because the network cannot see one eye
[00:09:45] because the network cannot see one eye of the dog is not confident that there
[00:09:47] of the dog is not confident that there is a dog anymore
[00:09:49] is a dog anymore so probably the confidence of the
[00:09:52] so probably the confidence of the network went down I'm going to put a
[00:09:53] network went down I'm going to put a square that is tending to be blue and I
[00:09:56] square that is tending to be blue and I continue I shift it again and here we
[00:09:59] continue I shift it again and here we don't see the dog face anymore
[00:10:00] don't see the dog face anymore so probably the network might might
[00:10:03] so probably the network might might classify this as a chair right because
[00:10:06] classify this as a chair right because the chair is more obvious than the dog
[00:10:08] the chair is more obvious than the dog now and so the probability score of dog
[00:10:10] now and so the probability score of dog might go down so I'm going to put a Blue
[00:10:12] might go down so I'm going to put a Blue Square here and I'm going to continue
[00:10:15] Square here and I'm going to continue here we don't see the tail of the dog
[00:10:18] here we don't see the tail of the dog it's still fine the network is pretty
[00:10:19] it's still fine the network is pretty confident and so on and what I will look
[00:10:24] confident and so on and what I will look at now is this probability map which
[00:10:26] at now is this probability map which tells me roughly where the dog is so
[00:10:29] tells me roughly where the dog is so here we use the pretty big filter
[00:10:30] here we use the pretty big filter compared to the size of the image the
[00:10:32] compared to the size of the image the smaller the sorry the pretty big gray
[00:10:34] smaller the sorry the pretty big gray square the smaller the gray square the
[00:10:38] square the smaller the gray square the more precise this probability map is
[00:10:40] more precise this probability map is going to be does that make sense so this
[00:10:43] going to be does that make sense so this is if you have time if you can you can
[00:10:46] is if you have time if you can you can take your time with the pet shot to
[00:10:47] take your time with the pet shot to explain them what's happening you would
[00:10:49] explain them what's happening you would do that yeah we will see that in the
[00:10:59] do that yeah we will see that in the next slide that's correct
[00:11:01] next slide that's correct so let's see more examples here we have
[00:11:04] so let's see more examples here we have three classes and these these these
[00:11:06] three classes and these these these images has been have been generated by
[00:11:08] images has been have been generated by much as I learned Rob Fergus this paper
[00:11:11] much as I learned Rob Fergus this paper visualizing and understanding
[00:11:12] visualizing and understanding convolutional networks is one of the
[00:11:16] convolutional networks is one of the seminal paper that has led the research
[00:11:18] seminal paper that has led the research in in visualizing and interpreting
[00:11:19] in in visualizing and interpreting neural networks so I'd advise you to
[00:11:21] neural networks so I'd advise you to take a look at it and we will refer to
[00:11:22] take a look at it and we will refer to it a lot of time in this lecture so now
[00:11:25] it a lot of time in this lecture so now we have three examples one is a
[00:11:27] we have three examples one is a pomeranian which is this type of cute
[00:11:29] pomeranian which is this type of cute dog a car wheel which is the true class
[00:11:32] dog a car wheel which is the true class of the second image and napkin hound
[00:11:35] of the second image and napkin hound which is this type of dog here on the
[00:11:37] which is this type of dog here on the last image so if you do the same thing
[00:11:40] last image so if you do the same thing as we did before that's what you would
[00:11:42] as we did before that's what you would see so just to clarify here we see a
[00:11:47] see so just to clarify here we see a blue color it means when the gray square
[00:11:49] blue color it means when the gray square was positioned here or centered at this
[00:11:52] was positioned here or centered at this location the network was less confident
[00:11:55] location the network was less confident that the true class was Pomeranian and
[00:11:58] that the true class was Pomeranian and in fact if you look at the paper they
[00:12:01] in fact if you look at the paper they explained that when the gray square was
[00:12:03] explained that when the gray square was here the confidence of Pomeranian went
[00:12:06] here the confidence of Pomeranian went down because the cumference because the
[00:12:08] down because the cumference because the confidence of tennis ball went up and in
[00:12:11] confidence of tennis ball went up and in fact the Pomeranian dog has a tennis
[00:12:13] fact the Pomeranian dog has a tennis ball in the mouth another interesting
[00:12:16] ball in the mouth another interesting thing to notice is on the last picture
[00:12:18] thing to notice is on the last picture here you see that there is a red color
[00:12:22] here you see that there is a red color on the top left of the image and this is
[00:12:26] on the top left of the image and this is you exactly at what as what you
[00:12:27] you exactly at what as what you mentioned Adam is that when the square
[00:12:29] mentioned Adam is that when the square was on the face of the human the network
[00:12:32] was on the face of the human the network was much more confident than the true
[00:12:34] was much more confident than the true class that the true class was the dog
[00:12:36] class that the true class was the dog because we removed a lot of meaningful
[00:12:38] because we removed a lot of meaningful information for the network which was
[00:12:40] information for the network which was the face of the human and similarly if
[00:12:43] the face of the human and similarly if you put the square on the dog the true
[00:12:46] you put the square on the dog the true class that the network was outputting
[00:12:48] class that the network was outputting was human problem that makes sense ok so
[00:12:54] was human problem that makes sense ok so this is called occlusion sensitivity and
[00:12:56] this is called occlusion sensitivity and it's the second method that you now have
[00:12:59] it's the second method that you now have seen for interpreting word the network
[00:13:03] seen for interpreting word the network looks at on an input so let's move to
[00:13:08] looks at on an input so let's move to class activation Maps so I know if you
[00:13:10] class activation Maps so I know if you remember but two weeks ago
[00:13:11] remember but two weeks ago Pranav when he discussed the techniques
[00:13:14] Pranav when he discussed the techniques that he has you
[00:13:15] that he has you in healthcare he explained that you get
[00:13:18] in healthcare he explained that you get a he get a chest x-ray and he manages to
[00:13:23] a he get a chest x-ray and he manages to to tell the doctor where the network is
[00:13:26] to tell the doctor where the network is looking at when predicting a certain
[00:13:28] looking at when predicting a certain disease based on this chest x-ray right
[00:13:31] disease based on this chest x-ray right remember that so this was done through
[00:13:33] remember that so this was done through class activation labs and that's what
[00:13:35] class activation labs and that's what we're going to see now so one important
[00:13:39] we're going to see now so one important thing to notice is that we discussed
[00:13:41] thing to notice is that we discussed that classification networks seem to
[00:13:45] that classification networks seem to have a very good localization ability
[00:13:47] have a very good localization ability and we can see it with the two methods
[00:13:50] and we can see it with the two methods that we previously discussed same thing
[00:13:52] that we previously discussed same thing for those of you who have read the
[00:13:54] for those of you who have read the yellow paper that you've studied in this
[00:13:56] yellow paper that you've studied in this set of modules the yellow v2 algorithm
[00:13:59] set of modules the yellow v2 algorithm has first been trained on classification
[00:14:01] has first been trained on classification because classification has a lot of data
[00:14:03] because classification has a lot of data a lot more than object detection has
[00:14:06] a lot more than object detection has been trained on classification built a
[00:14:08] been trained on classification built a very good localization ability and then
[00:14:11] very good localization ability and then has been fine-tuned and retrained on
[00:14:12] has been fine-tuned and retrained on object detection data sets okay and so
[00:14:16] object detection data sets okay and so the core idea of class activation map is
[00:14:18] the core idea of class activation map is to show that CNN's have a very good
[00:14:22] to show that CNN's have a very good localization ability even if they were
[00:14:24] localization ability even if they were trained only on image level labels so we
[00:14:28] trained only on image level labels so we have this Network there is a very
[00:14:30] have this Network there is a very classic Network used for classification
[00:14:33] classic Network used for classification we give it a kid and a dog this class
[00:14:37] we give it a kid and a dog this class activation map is coming from MIT MIT
[00:14:40] activation map is coming from MIT MIT lab with Balaji at all in 2016 and you
[00:14:44] lab with Balaji at all in 2016 and you for propagate this image of a kid with a
[00:14:46] for propagate this image of a kid with a dog through the network which has some
[00:14:48] dog through the network which has some comic spool classic series of layers
[00:14:51] comic spool classic series of layers several of them and at the end you
[00:14:53] several of them and at the end you usually flatten the last output volume
[00:14:56] usually flatten the last output volume of the comp and run it through several
[00:14:58] of the comp and run it through several fully connected layer which are going to
[00:15:01] fully connected layer which are going to play the role of a classifier and send
[00:15:03] play the role of a classifier and send it to a soft Max and get the probability
[00:15:05] it to a soft Max and get the probability output now what we're going to do is
[00:15:08] output now what we're going to do is that we're going to prove that this CNN
[00:15:10] that we're going to prove that this CNN is generalizing to localization so we're
[00:15:13] is generalizing to localization so we're going to convert this same network in
[00:15:16] going to convert this same network in another network and the part which is
[00:15:18] another network and the part which is going to change is only the last part
[00:15:20] going to change is only the last part the downside of using flattened plus
[00:15:23] the downside of using flattened plus fully connected is that you lose all
[00:15:26] fully connected is that you lose all spatial information right
[00:15:29] spatial information right you have a volume that has spatial
[00:15:31] you have a volume that has spatial information although it's been gone
[00:15:33] information although it's been gone through some max pooling so it's been
[00:15:34] through some max pooling so it's been down sampled and you lost some part of
[00:15:36] down sampled and you lost some part of the special localization flattening
[00:15:38] the special localization flattening kills it you flatten it you run it
[00:15:40] kills it you flatten it you run it through a fully connected layer and then
[00:15:42] through a fully connected layer and then it's over you it's it's super hard to
[00:15:44] it's over you it's it's super hard to find out where the activation was
[00:15:46] find out where the activation was corresponds to on the input space so
[00:15:50] corresponds to on the input space so instead of using flattened possibly
[00:15:51] instead of using flattened possibly connected we're going to use global
[00:15:53] connected we're going to use global average pooling we're going to explain
[00:15:55] average pooling we're going to explain what it is a fully connected softmax
[00:15:58] what it is a fully connected softmax layer and get the probability output and
[00:15:59] layer and get the probability output and we're going to show that now this
[00:16:01] we're going to show that now this network can be trained very quickly
[00:16:04] network can be trained very quickly because we just need to train one layer
[00:16:06] because we just need to train one layer the fully connected here and can show
[00:16:08] the fully connected here and can show where the network looks at the same as
[00:16:11] where the network looks at the same as the previous network so let's talk about
[00:16:13] the previous network so let's talk about it more in detail assume this was the
[00:16:16] it more in detail assume this was the last complex and it outputs a volume a
[00:16:20] last complex and it outputs a volume a volume that is sized to simplify 4 by 4
[00:16:23] volume that is sized to simplify 4 by 4 by 6 so 6 filters were used in the last
[00:16:27] by 6 so 6 filters were used in the last comp and so we have 6 feature Maps now
[00:16:30] comp and so we have 6 feature Maps now that make sense I'm going to convert
[00:16:33] that make sense I'm going to convert this using a global average pooling to
[00:16:35] this using a global average pooling to just a vector of 6 values what is global
[00:16:38] just a vector of 6 values what is global average pooling is just taking these
[00:16:40] average pooling is just taking these feature Maps each of them averaging them
[00:16:42] feature Maps each of them averaging them into one number so now instead of having
[00:16:45] into one number so now instead of having a 4 by 4 by 6 volume I have a 1 by 1 by
[00:16:49] a 4 by 4 by 6 volume I have a 1 by 1 by 6 volume but we can call it a vector now
[00:16:53] 6 volume but we can call it a vector now that makes sense
[00:16:54] that makes sense so what's interesting is that this
[00:16:56] so what's interesting is that this number actually holds the information of
[00:16:59] number actually holds the information of the whole feature map that came before
[00:17:01] the whole feature map that came before in one number being averaged over it I'm
[00:17:05] in one number being averaged over it I'm going to put these in a vector and I'm
[00:17:08] going to put these in a vector and I'm going to call them activations as usual
[00:17:10] going to call them activations as usual a 1 a 2 a 3 a 4 a 5 a 6 as I said I'm
[00:17:15] a 1 a 2 a 3 a 4 a 5 a 6 as I said I'm going to train a fully connected layer
[00:17:17] going to train a fully connected layer here with the softmax activation and the
[00:17:20] here with the softmax activation and the outputs are going to be the
[00:17:21] outputs are going to be the probabilities so what is interesting
[00:17:24] probabilities so what is interesting about that is that the feature Maps here
[00:17:28] about that is that the feature Maps here as you know will contain some visual
[00:17:30] as you know will contain some visual patterns so if I look at the first
[00:17:32] patterns so if I look at the first feature map I can plot it here so these
[00:17:35] feature map I can plot it here so these are the values and of course this one is
[00:17:38] are the values and of course this one is much more granular than 4 by 4 it's not
[00:17:40] much more granular than 4 by 4 it's not a 4 by 4 it's much more
[00:17:42] a 4 by 4 it's much more but this you can say that this is the
[00:17:44] but this you can say that this is the feature map and it seems that the
[00:17:46] feature map and it seems that the activations have found something here
[00:17:48] activations have found something here there was a visual pattern in the input
[00:17:50] there was a visual pattern in the input that activated the feature map and the
[00:17:53] that activated the feature map and the filters which generated this feature map
[00:17:54] filters which generated this feature map here in this location same for the
[00:17:57] here in this location same for the second one there is probably two objects
[00:18:00] second one there is probably two objects or two patterns that activated the
[00:18:03] or two patterns that activated the filters that generated this feature map
[00:18:05] filters that generated this feature map and so on so we have six of those and
[00:18:09] and so on so we have six of those and after I've trained my fully connected
[00:18:12] after I've trained my fully connected layers here my fully connected layer I
[00:18:14] layers here my fully connected layer I look at the score of dog score of dog is
[00:18:17] look at the score of dog score of dog is 91% what I can do is to know this 91%
[00:18:23] 91% what I can do is to know this 91% how much did it come from these feature
[00:18:26] how much did it come from these feature maps and how can I know it is because
[00:18:29] maps and how can I know it is because now I have a direct mapping using the
[00:18:30] now I have a direct mapping using the weights I know that the weight number
[00:18:33] weights I know that the weight number one here this edge you see it is how
[00:18:37] one here this edge you see it is how much this score was dependent on the
[00:18:40] much this score was dependent on the orange feature map that make sense the
[00:18:46] orange feature map that make sense the second weight if you look at the green
[00:18:48] second weight if you look at the green edge is the weights that has multiplied
[00:18:52] edge is the weights that has multiplied this feature map to give birth to the
[00:18:56] this feature map to give birth to the output of a dog so this weight is
[00:18:58] output of a dog so this weight is telling me how much this feature map the
[00:19:01] telling me how much this feature map the green one has influence on the output
[00:19:03] green one has influence on the output that make sense so now what I can do is
[00:19:07] that make sense so now what I can do is to sum all of this a weighted sum of all
[00:19:11] to sum all of this a weighted sum of all these feature Maps and if I just do this
[00:19:13] these feature Maps and if I just do this weighted sum I will get another feature
[00:19:15] weighted sum I will get another feature map something like that and you notice
[00:19:19] map something like that and you notice that this one seems to be highly
[00:19:21] that this one seems to be highly influenced by the green one the green
[00:19:24] influenced by the green one the green feature map yeah it means probably the
[00:19:26] feature map yeah it means probably the weight here was higher it probably means
[00:19:32] weight here was higher it probably means that the second filter of the last comp
[00:19:35] that the second filter of the last comp was the one that was looking at the dog
[00:19:39] was the one that was looking at the dog that make sense okay and then once I get
[00:19:44] that make sense okay and then once I get this feature map this feature map is not
[00:19:46] this feature map this feature map is not the size of the input image right it's
[00:19:48] the size of the input image right it's the size of the height and width of the
[00:19:52] the size of the height and width of the output of the last comp so the only
[00:19:54] output of the last comp so the only thing I'm going to do is like I'm going
[00:19:55] thing I'm going to do is like I'm going to up Sam
[00:19:56] to up Sam back simply so that it fits the size of
[00:19:59] back simply so that it fits the size of the input image and I'm going to overlay
[00:20:01] the input image and I'm going to overlay it on the input image to get my class
[00:20:04] it on the input image to get my class activation the reason it's called class
[00:20:06] activation the reason it's called class activation map is because this feature
[00:20:08] activation map is because this feature map is dependent on the class you're
[00:20:11] map is dependent on the class you're talking about if I was using let's say I
[00:20:15] talking about if I was using let's say I was using car here if I was using car
[00:20:19] was using car here if I was using car the weights would have been different
[00:20:21] the weights would have been different right look at the edges that connect the
[00:20:24] right look at the edges that connect the first activation to the activation of
[00:20:26] first activation to the activation of the previous layer these weights are
[00:20:28] the previous layer these weights are different so if I sum all of these
[00:20:30] different so if I sum all of these feature Maps I'm going to get something
[00:20:31] feature Maps I'm going to get something else does that make sense so this is
[00:20:35] else does that make sense so this is class activation mass and in fact there
[00:20:41] class activation mass and in fact there is a dog here and there's a human there
[00:20:42] is a dog here and there's a human there and what you can notice is probably if I
[00:20:45] and what you can notice is probably if I look at the class of human the weights
[00:20:47] look at the class of human the weights number one might be very high because it
[00:20:51] number one might be very high because it seems that this visual pattern that
[00:20:52] seems that this visual pattern that activated the first feature map was the
[00:20:55] activated the first feature map was the face of the kid okay so what is super
[00:21:00] face of the kid okay so what is super cool is that you can get your network
[00:21:02] cool is that you can get your network and just change the last few layers into
[00:21:04] and just change the last few layers into global average pooling plus the softmax
[00:21:06] global average pooling plus the softmax fully connected layer and you can do
[00:21:08] fully connected layer and you can do that and visualize very well it requires
[00:21:10] that and visualize very well it requires a small fine tuning yeah so it's a
[00:21:19] a small fine tuning yeah so it's a different vocabulary I would use
[00:21:20] different vocabulary I would use failures see maps for the
[00:21:22] failures see maps for the backpropagation up to the pixels and
[00:21:23] backpropagation up to the pixels and class activation maps related to one
[00:21:26] class activation maps related to one class it's not a back propagation at all
[00:21:30] class it's not a back propagation at all it's just not sampling to the to the
[00:21:33] it's just not sampling to the to the input space based on the feature maps of
[00:21:35] input space based on the feature maps of the last compilation mostly just
[00:21:39] the last compilation mostly just examining the weights that are doing
[00:21:40] examining the weights that are doing like a max operation not so much of a
[00:21:44] like a max operation not so much of a background obligation yes any other
[00:21:48] background obligation yes any other questions on class activation maps that
[00:21:51] questions on class activation maps that it's not
[00:21:53] it's not yeah that's a good question so taking
[00:21:55] yeah that's a good question so taking the average does it kill the spatial
[00:21:57] the average does it kill the spatial information so let me let me write down
[00:21:59] information so let me let me write down a formula here this is the score that
[00:22:01] a formula here this is the score that we're interested in let's say dog class
[00:22:04] we're interested in let's say dog class see what you could say is that this
[00:22:06] see what you could say is that this score is a sum of k equal 1 to 6 fw k
[00:22:12] score is a sum of k equal 1 to 6 fw k which is the the weight that that
[00:22:15] which is the the weight that that connects the output activation to the
[00:22:17] connects the output activation to the previous layer times what times a of the
[00:22:22] previous layer times what times a of the previous layer let's say we use a
[00:22:25] previous layer let's say we use a notation that is like k is the chase
[00:22:27] notation that is like k is the chase feature map and i j is the location and
[00:22:33] feature map and i j is the location and I sum that over the locations can you
[00:22:37] I sum that over the locations can you see in the back roughly so what I'm
[00:22:40] see in the back roughly so what I'm saying is that here I have my global
[00:22:42] saying is that here I have my global average pooling that happened here and I
[00:22:44] average pooling that happened here and I can divide it by the certain number so
[00:22:46] can divide it by the certain number so divided by 16 4x4 okay I can switch the
[00:22:54] divided by 16 4x4 okay I can switch the two sums so I can say that this thing is
[00:22:56] two sums so I can say that this thing is a sum over I J the locations times sum
[00:23:03] a sum over I J the locations times sum over k equals 1 to 6 of what W K times H
[00:23:13] over k equals 1 to 6 of what W K times H a so the activations of the case feature
[00:23:15] a so the activations of the case feature map in position H I J and times the
[00:23:19] map in position H I J and times the normalization 116 doesn't make sense
[00:23:26] does this make sense so I still have the
[00:23:29] does this make sense so I still have the location I still moved
[00:23:32] location I still moved I still move the sum around and what I
[00:23:34] I still move the sum around and what I could do is to say that this thing is
[00:23:38] could do is to say that this thing is the score in location IJ of the class
[00:23:48] the score in location IJ of the class activation up is the class score for
[00:23:50] activation up is the class score for this location IJ and I'm summing it over
[00:23:53] this location IJ and I'm summing it over all locations so just by flipping what
[00:23:57] all locations so just by flipping what the average pooling was doing over the
[00:23:58] the average pooling was doing over the locations I can say that by weighting
[00:24:02] locations I can say that by weighting using my weights all the activation
[00:24:06] using my weights all the activation in a specific location for all the
[00:24:07] in a specific location for all the feature maps I can get the score of this
[00:24:10] feature maps I can get the score of this position in regards to the final output
[00:24:14] position in regards to the final output does that make sense so we were not
[00:24:19] does that make sense so we were not losing the the spatial information the
[00:24:24] losing the the spatial information the reason we're not losing it is because we
[00:24:27] reason we're not losing it is because we know we know what the feature maps are
[00:24:29] know we know what the feature maps are right we know what they are and we know
[00:24:32] right we know what they are and we know that they've been averaged exactly so we
[00:24:34] that they've been averaged exactly so we exactly can map it back because we
[00:24:42] exactly can map it back because we assume that each filter that generated
[00:24:45] assume that each filter that generated this feature Maps detects one one
[00:24:47] this feature Maps detects one one specific thing so like if if this is the
[00:24:52] specific thing so like if if this is the feature map it means assuming the filter
[00:24:55] feature map it means assuming the filter was detecting dog that we're going to
[00:24:57] was detecting dog that we're going to see just just something here meaning
[00:25:01] see just just something here meaning that there is a dog here and if there
[00:25:03] that there is a dog here and if there was a dog on the lower part of the image
[00:25:05] was a dog on the lower part of the image we would also have strong activations in
[00:25:07] we would also have strong activations in this part I'd say if you want to see
[00:25:15] this part I'd say if you want to see more of the math behind it check the
[00:25:17] more of the math behind it check the papers but this is the intuition behind
[00:25:20] papers but this is the intuition behind it you can flip the summations using the
[00:25:23] it you can flip the summations using the global average pooling and show that you
[00:25:24] global average pooling and show that you keep the spatial information the thing
[00:25:29] keep the spatial information the thing is you do the global average pooling but
[00:25:30] is you do the global average pooling but you don't lose the future maps because
[00:25:32] you don't lose the future maps because you know where they were from the output
[00:25:34] you know where they were from the output of the count right so you're not you're
[00:25:36] of the count right so you're not you're not deleting this information that make
[00:25:38] not deleting this information that make sense yeah the average yeah okay let's
[00:25:51] sense yeah the average yeah okay let's move on and watch a full video on how a
[00:25:54] move on and watch a full video on how a class activation not work
[00:25:55] class activation not work this video was from Kylie McDonald
[00:26:01] and it's it's life so it's very quick so
[00:26:10] and it's it's life so it's very quick so you can see that the network is looking
[00:26:12] you can see that the network is looking at the speedboat okay so now the three
[00:26:30] at the speedboat okay so now the three methods we've seen are methods that are
[00:26:35] methods we've seen are methods that are roughly mapping back the output to the
[00:26:38] roughly mapping back the output to the input space and helping of visualize
[00:26:40] input space and helping of visualize which part of the inputs were the most
[00:26:41] which part of the inputs were the most discriminative to lead to this output
[00:26:44] discriminative to lead to this output and the decision of the network now
[00:26:46] and the decision of the network now we're going to try to delve more into
[00:26:48] we're going to try to delve more into details in the in the in the
[00:26:50] details in the in the in the intermediate layers of the network and
[00:26:52] intermediate layers of the network and try to interpret how does the network
[00:26:54] try to interpret how does the network see our world not necessarily related to
[00:26:57] see our world not necessarily related to a specific input but in general okay so
[00:27:03] a specific input but in general okay so the pet shop now trust your model
[00:27:05] the pet shop now trust your model because you've used origin sensitivity
[00:27:07] because you've used origin sensitivity salient see map and class actuation maps
[00:27:09] salient see map and class actuation maps to show that the model is looking at the
[00:27:10] to show that the model is looking at the right place but they got a little scared
[00:27:13] right place but they got a little scared when you did that and they asked you to
[00:27:15] when you did that and they asked you to explain what the model thinks a dog is
[00:27:20] explain what the model thinks a dog is so you have this trained convolutional
[00:27:22] so you have this trained convolutional neural network and you have an output
[00:27:25] neural network and you have an output probability yep let me take one non
[00:27:34] probability yep let me take one non image data that's that's a good question
[00:27:36] image data that's that's a good question it's actually so the reason we're seeing
[00:27:38] it's actually so the reason we're seeing images what most of the research has
[00:27:39] images what most of the research has been focusing on images if you look at
[00:27:43] been focusing on images if you look at electric time series data
[00:27:45] electric time series data so either speech or natural language the
[00:27:48] so either speech or natural language the main way to visualize those is with the
[00:27:51] main way to visualize those is with the attention method are you familiar with
[00:27:54] attention method are you familiar with that so in the next set of modules that
[00:27:56] that so in the next set of modules that you're going to start this week and
[00:27:57] you're going to start this week and you're going to study in the next two
[00:27:59] you're going to study in the next two weeks you will see a visualization
[00:28:00] weeks you will see a visualization method called attention models which
[00:28:03] method called attention models which will tell you which part of a sentence
[00:28:05] will tell you which part of a sentence was important let's say to output a
[00:28:08] was important let's say to output a number like assuming you're doing
[00:28:11] number like assuming you're doing machine translation you know some
[00:28:13] machine translation you know some languages they don't have a direct
[00:28:14] languages they don't have a direct one-to-one mapping it means I might say
[00:28:16] one-to-one mapping it means I might say I love cats but in another language
[00:28:20] I love cats but in another language maybe this same sentence would be
[00:28:22] maybe this same sentence would be attached I love or something it's fit
[00:28:24] attached I love or something it's fit and you want an attention model to seek
[00:28:27] and you want an attention model to seek to show you that the cat was referring
[00:28:29] to show you that the cat was referring to the second I think it's okay sorry
[00:28:33] to the second I think it's okay sorry guys
[00:28:38] so going back to the presentation now
[00:28:41] so going back to the presentation now we're going to delve into inside the
[00:28:44] we're going to delve into inside the network and so the new thing is the pet
[00:28:47] network and so the new thing is the pet shop is little scared and ask you to
[00:28:49] shop is little scared and ask you to explain what the network think a dog is
[00:28:51] explain what the network think a dog is what's the representation of dog for the
[00:28:53] what's the representation of dog for the network so here we're going to use a
[00:28:55] network so here we're going to use a method that we've already seen together
[00:28:56] method that we've already seen together called gradient ascent which is defining
[00:29:00] called gradient ascent which is defining an objective that is technically the
[00:29:04] an objective that is technically the score of the dog - a regularization term
[00:29:07] score of the dog - a regularization term what the regularization term is doing is
[00:29:09] what the regularization term is doing is it's saying that X should look natural
[00:29:11] it's saying that X should look natural it's not necessarily l2 regularization
[00:29:13] it's not necessarily l2 regularization can be something else and we will
[00:29:16] can be something else and we will discuss it in the next slide but don't
[00:29:18] discuss it in the next slide but don't think about it right now what we will do
[00:29:20] think about it right now what we will do is we will compute the back propagation
[00:29:22] is we will compute the back propagation of this objective function all the way
[00:29:24] of this objective function all the way back to the input and perform gradient
[00:29:27] back to the input and perform gradient ascent to find the image that maximizes
[00:29:29] ascent to find the image that maximizes the score of the dog so it's an
[00:29:31] the score of the dog so it's an iterative process takes longer than the
[00:29:33] iterative process takes longer than the class activation map and we repeat the
[00:29:37] class activation map and we repeat the process forward propagate X compute the
[00:29:39] process forward propagate X compute the objective back propagate and update the
[00:29:41] objective back propagate and update the pixels and so on you guys are familiar
[00:29:42] pixels and so on you guys are familiar with that so let's see what what what we
[00:29:45] with that so let's see what what what we can visualize doing that so actually if
[00:29:48] can visualize doing that so actually if you take an image net classification
[00:29:50] you take an image net classification network and you perform this on the
[00:29:52] network and you perform this on the classes of goose or ostrich or Kitfox
[00:29:54] classes of goose or ostrich or Kitfox Husky Dalmatians you can see what the
[00:29:57] Husky Dalmatians you can see what the network is looking at or what the
[00:29:59] network is looking at or what the network think that almassian is so for
[00:30:02] network think that almassian is so for the Dalmatian you can see some some
[00:30:04] the Dalmatian you can see some some black dots on a white background somehow
[00:30:07] black dots on a white background somehow but these are still quite hard to
[00:30:10] but these are still quite hard to interpret it's not super easy to see and
[00:30:12] interpret it's not super easy to see and even worse here on the screen better on
[00:30:15] even worse here on the screen better on your computers but you can see a fox
[00:30:18] your computers but you can see a fox some here you can see orange color for
[00:30:21] some here you can see orange color for the fox it means that pushing the pixels
[00:30:23] the fox it means that pushing the pixels to an orange color would actually lead
[00:30:25] to an orange color would actually lead to a higher score of the kid fox in the
[00:30:27] to a higher score of the kid fox in the output
[00:30:28] output if you use a better regularization than
[00:30:31] if you use a better regularization than l2 you might get better pictures so this
[00:30:34] l2 you might get better pictures so this is for flamingo this is for Pelican and
[00:30:36] is for flamingo this is for Pelican and this is for Hartley's so a few things
[00:30:39] this is for Hartley's so a few things that are interesting to see is that in
[00:30:40] that are interesting to see is that in order to maximize the score of flamingo
[00:30:42] order to maximize the score of flamingo what the network visualized is many
[00:30:46] what the network visualized is many flamingos it means that ten flamingos
[00:30:49] flamingos it means that ten flamingos leads to a higher score of the class
[00:30:51] leads to a higher score of the class salmon go than one flamingo for the
[00:30:53] salmon go than one flamingo for the network talking about regularization
[00:30:57] network talking about regularization what does l2 regularization say it says
[00:31:00] what does l2 regularization say it says that for visualising we don't want to
[00:31:02] that for visualising we don't want to have extreme values of pixel it doesn't
[00:31:04] have extreme values of pixel it doesn't help much to have one pixel with an
[00:31:06] help much to have one pixel with an extreme value one pixel with a low value
[00:31:09] extreme value one pixel with a low value and so on so we're going to regularize
[00:31:10] and so on so we're going to regularize all the pixels so that all the values
[00:31:12] all the pixels so that all the values are around each other and then we can
[00:31:14] are around each other and then we can rescale it between 0 and 20 255 if you
[00:31:17] rescale it between 0 and 20 255 if you want one thing to notice is that the
[00:31:19] want one thing to notice is that the gradient ascent process doesn't
[00:31:22] gradient ascent process doesn't constrain the inputs to be between 0 and
[00:31:26] constrain the inputs to be between 0 and 255 you can go to plus infinity
[00:31:29] 255 you can go to plus infinity potentially while an image is stored
[00:31:32] potentially while an image is stored with numbers between 0 and 255 so you
[00:31:34] with numbers between 0 and 255 so you might want to clip that as well this is
[00:31:36] might want to clip that as well this is another type of regularization one thing
[00:31:38] another type of regularization one thing that led to beautiful pictures was what
[00:31:41] that led to beautiful pictures was what Jason your sinski and his team did is
[00:31:44] Jason your sinski and his team did is they for propagated an image computed
[00:31:48] they for propagated an image computed the score computed the objective
[00:31:50] the score computed the objective function back propagated updated the
[00:31:53] function back propagated updated the pixels and blurred them blurred the
[00:31:55] pixels and blurred them blurred the picture because what what is not useful
[00:31:58] picture because what what is not useful for visualizing is if you have high
[00:32:00] for visualizing is if you have high frequency variation between pixels it
[00:32:02] frequency variation between pixels it doesn't help to visualize if you have
[00:32:04] doesn't help to visualize if you have many pixels close to each other that
[00:32:06] many pixels close to each other that have many different values instead you
[00:32:08] have many different values instead you want to have a smooth transition among
[00:32:10] want to have a smooth transition among pixels and this is another type of
[00:32:12] pixels and this is another type of regularization called Gaussian blur Inc
[00:32:15] regularization called Gaussian blur Inc ok so this method actually makes a lot
[00:32:19] ok so this method actually makes a lot of sense in in in scientific terms
[00:32:22] of sense in in in scientific terms you're you're maximizing an objective
[00:32:24] you're you're maximizing an objective function that gives you what the network
[00:32:26] function that gives you what the network sees as flamingo which would maximize
[00:32:28] sees as flamingo which would maximize the score of flamingo
[00:32:29] the score of flamingo so we call it also class model
[00:32:32] so we call it also class model visualization yes
[00:32:41] a more realistic class model
[00:32:45] a more realistic class model visualization correspond to more
[00:32:47] visualization correspond to more accurate so it's hard to map the
[00:32:49] accurate so it's hard to map the accuracy of the model based on this
[00:32:51] accuracy of the model based on this visualization it's a good way to
[00:32:53] visualization it's a good way to validate that the network is looking at
[00:32:55] validate that the network is looking at the right thing yeah we're going to see
[00:32:58] the right thing yeah we're going to see more of this later I think the most
[00:33:01] more of this later I think the most interesting part is actually on this
[00:33:03] interesting part is actually on this slide is we did it for the class score
[00:33:06] slide is we did it for the class score but we could have done it with any
[00:33:07] but we could have done it with any activation so let's say I stopped in the
[00:33:10] activation so let's say I stopped in the middle of the network and I define my
[00:33:12] middle of the network and I define my objective function to be this activation
[00:33:15] objective function to be this activation I'm going to back propagate and find the
[00:33:18] I'm going to back propagate and find the input that will maximize this activation
[00:33:20] input that will maximize this activation it will tell me what is this activation
[00:33:22] it will tell me what is this activation what does this activation fire for so
[00:33:26] what does this activation fire for so that's even more interesting I think
[00:33:28] that's even more interesting I think than looking at the input and then yep
[00:33:30] than looking at the input and then yep does that make sense that we could do it
[00:33:32] does that make sense that we could do it on any activation yep any questions on
[00:33:41] on any activation yep any questions on that
[00:33:44] okay so now we're going to do another
[00:33:49] okay so now we're going to do another trick which is dataset search it's
[00:33:51] trick which is dataset search it's actually one of the most useful I think
[00:33:53] actually one of the most useful I think not fast but very useful
[00:33:55] not fast but very useful so the petrov loved the previous
[00:33:57] so the petrov loved the previous technique and asks if there are other
[00:33:59] technique and asks if there are other alternatives to to show what what an
[00:34:02] alternatives to to show what what an activation in the middle of a network is
[00:34:05] activation in the middle of a network is thinking you take an image for
[00:34:08] thinking you take an image for propagated through the network get your
[00:34:10] propagated through the network get your output now what you're going to do is
[00:34:13] output now what you're going to do is select a feature map let's say this one
[00:34:16] select a feature map let's say this one we're at this layer and the feature map
[00:34:18] we're at this layer and the feature map is of size 5x5 by 256 it means that the
[00:34:22] is of size 5x5 by 256 it means that the complan had 256 filters right you're
[00:34:29] complan had 256 filters right you're going to look at these feature maps and
[00:34:32] going to look at these feature maps and select probably yeah what you're going
[00:34:36] select probably yeah what you're going to do select one of the feature maps
[00:34:38] to do select one of the feature maps okay we select one out of 256 it feature
[00:34:42] okay we select one out of 256 it feature map and we're going to run a lot of data
[00:34:45] map and we're going to run a lot of data for propagated to the network and look
[00:34:47] for propagated to the network and look which data points have had the maximum
[00:34:51] which data points have had the maximum activation of this feature map so let's
[00:34:55] activation of this feature map so let's say we do it with the first feature map
[00:34:57] say we do it with the first feature map we notice that these are the top five
[00:34:59] we notice that these are the top five images that really fired this feature
[00:35:03] images that really fired this feature map like high activations on the fibula
[00:35:06] map like high activations on the fibula what it tells us is that's probably this
[00:35:08] what it tells us is that's probably this feature map is detecting shirts could do
[00:35:12] feature map is detecting shirts could do the same thing let's say we take the
[00:35:14] the same thing let's say we take the second feature map and we look which
[00:35:17] second feature map and we look which data points have maximized the
[00:35:20] data points have maximized the activations of this feature map out of a
[00:35:22] activations of this feature map out of a lot of data and we see that this is what
[00:35:25] lot of data and we see that this is what we got the top five images probably
[00:35:28] we got the top five images probably means that the other feature map seems
[00:35:30] means that the other feature map seems to be activated when seeing edges so the
[00:35:36] to be activated when seeing edges so the second one is much more likely to appear
[00:35:38] second one is much more likely to appear earlier in the network obviously than
[00:35:40] earlier in the network obviously than later so one thing that you may ask is
[00:35:44] later so one thing that you may ask is this images sim crop like I don't think
[00:35:47] this images sim crop like I don't think that this was an image in the dataset is
[00:35:49] that this was an image in the dataset is probably a sub part of the image what do
[00:35:53] probably a sub part of the image what do you think this crop corresponds to
[00:36:05] any idea how we cropped the image and
[00:36:09] any idea how we cropped the image and why these are cropped
[00:36:16] like what why didn't I show you the full
[00:36:19] like what why didn't I show you the full images how was I able to show you the
[00:36:22] images how was I able to show you the cropped back anything that's correct
[00:36:33] cropped back anything that's correct so let's say we pick an activation an
[00:36:37] so let's say we pick an activation an activation in the network this
[00:36:39] activation in the network this activation for a convolutional neural
[00:36:41] activation for a convolutional neural network often time doesn't see the
[00:36:43] network often time doesn't see the entire input image right doesn't see it
[00:36:47] entire input image right doesn't see it what it sees is a subspace of the inputs
[00:36:51] what it sees is a subspace of the inputs image that make sense so let's look at
[00:36:56] image that make sense so let's look at another slide here we have a picture of
[00:36:57] another slide here we have a picture of unis 64 by 64 by 3 it's our inputs we
[00:37:01] unis 64 by 64 by 3 it's our inputs we run it through a five layer confidence
[00:37:03] run it through a five layer confidence and now we get an encoding volume that
[00:37:06] and now we get an encoding volume that is much smaller in height and width but
[00:37:09] is much smaller in height and width but bigger in depth if I tell you what this
[00:37:13] bigger in depth if I tell you what this activation is seeing if you map it back
[00:37:16] activation is seeing if you map it back you look at the stride and the filter
[00:37:18] you look at the stride and the filter size you've used you could say that this
[00:37:20] size you've used you could say that this is the part that this interesting
[00:37:22] is the part that this interesting this-this-this activation is same it
[00:37:25] this-this-this activation is same it means the pixel that was up there had no
[00:37:28] means the pixel that was up there had no influence on this activation and it
[00:37:30] influence on this activation and it makes sense when you think of it you're
[00:37:32] makes sense when you think of it you're the easiest way to think about it is
[00:37:34] the easiest way to think about it is looking at the the top picks the top
[00:37:37] looking at the the top picks the top entry on the encoding volume top left
[00:37:39] entry on the encoding volume top left entry you have the input image you put a
[00:37:43] entry you have the input image you put a filter here this filter gives you one
[00:37:45] filter here this filter gives you one number right this number this activation
[00:37:48] number right this number this activation only depends on this part of the image
[00:37:51] only depends on this part of the image but then if you add a convolution after
[00:37:54] but then if you add a convolution after it it will take more filters and so the
[00:37:58] it it will take more filters and so the deeper you go the more part of the image
[00:38:01] deeper you go the more part of the image the activation will see so if you look
[00:38:04] the activation will see so if you look at an activation in layer 10 it will
[00:38:06] at an activation in layer 10 it will seem much a much larger part of the
[00:38:08] seem much a much larger part of the input than an activation in layer 1 that
[00:38:11] input than an activation in layer 1 that make sense so that's why that's why
[00:38:16] make sense so that's why that's why probably the pictures that I showed here
[00:38:18] probably the pictures that I showed here these ones are very small part crops
[00:38:21] these ones are very small part crops small crops of the image which means the
[00:38:24] small crops of the image which means the activation I was talking about here is
[00:38:26] activation I was talking about here is probably earlier in the network it sees
[00:38:28] probably earlier in the network it sees a much smaller part of the input
[00:38:32] the final image nice what a nice one
[00:38:38] the final image nice what a nice one pixel exact is going to respond to one
[00:38:39] pixel exact is going to respond to one daily part the image with the pipe yeah
[00:38:43] daily part the image with the pipe yeah yeah so what you look at is which
[00:38:45] yeah so what you look at is which activation was maximum you look at this
[00:38:47] activation was maximum you look at this one and then you match this one back to
[00:38:50] one and then you match this one back to crop it make sense ok so here's again up
[00:38:57] crop it make sense ok so here's again up and same this one would correspond more
[00:38:59] and same this one would correspond more in the center of the image this
[00:39:03] in the center of the image this intuition make sense ok good so let's
[00:39:09] intuition make sense ok good so let's talk about deconvolution now it's gonna
[00:39:12] talk about deconvolution now it's gonna be the hardest part of the lecture but
[00:39:13] be the hardest part of the lecture but probably helping with with more
[00:39:16] probably helping with with more intuition on the convolution you
[00:39:18] intuition on the convolution you remember that that was the generative
[00:39:21] remember that that was the generative add virtual networks scheme and we said
[00:39:25] add virtual networks scheme and we said that giving a code to the generator the
[00:39:27] that giving a code to the generator the generator is able to output an image so
[00:39:30] generator is able to output an image so there is something happening here that
[00:39:31] there is something happening here that we didn't talk about is how can we start
[00:39:33] we didn't talk about is how can we start with a 100 dimensional vector and now to
[00:39:36] with a 100 dimensional vector and now to put a 64 by 64 by 3 image that seems
[00:39:41] put a 64 by 64 by 3 image that seems weird we could use you might say a fully
[00:39:44] weird we could use you might say a fully connected layer with a lot of neurons
[00:39:46] connected layer with a lot of neurons right to up sample in practice this is
[00:39:50] right to up sample in practice this is one method another one is to use a
[00:39:51] one method another one is to use a deconvolution network so convolutions
[00:39:54] deconvolution network so convolutions will encode the information in a smaller
[00:39:57] will encode the information in a smaller volume in heightened with deeper in in
[00:39:59] volume in heightened with deeper in in depth while the deconvolution will do
[00:40:03] depth while the deconvolution will do the reverse it will up sample the height
[00:40:07] the reverse it will up sample the height and width of an image so that would be
[00:40:09] and width of an image so that would be useful in this case another case where
[00:40:13] useful in this case another case where it would be usefully segmentation you
[00:40:14] it would be usefully segmentation you remember our case studies for
[00:40:16] remember our case studies for segmentation lifecell microscopic images
[00:40:19] segmentation lifecell microscopic images of cells give it to a convolution
[00:40:22] of cells give it to a convolution Network it's gonna encode it so it's
[00:40:24] Network it's gonna encode it so it's gonna lower the height and width the
[00:40:27] gonna lower the height and width the interesting thing about this encoding in
[00:40:29] interesting thing about this encoding in the middle is that it holds a lot of
[00:40:31] the middle is that it holds a lot of meaningful information but what we want
[00:40:33] meaningful information but what we want ultimately is to get a segmentation mask
[00:40:36] ultimately is to get a segmentation mask and the segmentation mask in height and
[00:40:38] and the segmentation mask in height and width has to be the same size as the
[00:40:41] width has to be the same size as the pixel image so we need
[00:40:43] pixel image so we need volution Network - up sample it so the
[00:40:48] volution Network - up sample it so the conversion are used in these cases today
[00:40:50] conversion are used in these cases today the case we're going to talk about is
[00:40:52] the case we're going to talk about is visualization remember the gradient
[00:40:56] visualization remember the gradient ascent method we talked about we define
[00:40:58] ascent method we talked about we define an objective function by choosing an
[00:41:00] an objective function by choosing an activation in the middle of the network
[00:41:02] activation in the middle of the network and we want the objective to be equal to
[00:41:04] and we want the objective to be equal to this activation to find the input image
[00:41:06] this activation to find the input image that maximizes its activation through an
[00:41:08] that maximizes its activation through an iterative process now we don't want to
[00:41:10] iterative process now we don't want to use an iterative process we want to use
[00:41:13] use an iterative process we want to use reconstruction of this activation
[00:41:15] reconstruction of this activation directly in the input space by one
[00:41:17] directly in the input space by one backward path so let's say I select this
[00:41:21] backward path so let's say I select this feature map out of the max book 255
[00:41:26] feature map out of the max book 255 sorry 5x5 by 256 what I'm going to do is
[00:41:30] sorry 5x5 by 256 what I'm going to do is I'm going to identify the max activation
[00:41:33] I'm going to identify the max activation of this feature map here it is is this
[00:41:35] of this feature map here it is is this one third column second row I'm going to
[00:41:41] one third column second row I'm going to set all the others to zero just this one
[00:41:44] set all the others to zero just this one I keep it because it seems that this one
[00:41:46] I keep it because it seems that this one has detected something don't want to
[00:41:49] has detected something don't want to talk about the others I'm going to try
[00:41:52] talk about the others I'm going to try to reconstruct in the input space what
[00:41:54] to reconstruct in the input space what this activation has fired for so I'm
[00:41:57] this activation has fired for so I'm going to compute the reverse
[00:42:00] going to compute the reverse mathematical operation of pooling relu
[00:42:02] mathematical operation of pooling relu and convolution I will pull I will
[00:42:07] and convolution I will pull I will unreal you let's say doesn't like this
[00:42:09] unreal you let's say doesn't like this word doesn't exist so don't use it but
[00:42:11] word doesn't exist so don't use it but unreal ooh and decomp and I will do it
[00:42:14] unreal ooh and decomp and I will do it several times because this activation
[00:42:16] several times because this activation went through several of them so I will
[00:42:18] went through several of them so I will do it again and again until I see all
[00:42:22] do it again and again until I see all this specific activation that I selected
[00:42:26] this specific activation that I selected in the feature map fired because it
[00:42:29] in the feature map fired because it showed the ears of the duck and as you
[00:42:32] showed the ears of the duck and as you see this image is cropped again it's not
[00:42:34] see this image is cropped again it's not the entire image it's just the part that
[00:42:36] the entire image it's just the part that the activation has seen and if you look
[00:42:38] the activation has seen and if you look at where the activation is located on
[00:42:39] at where the activation is located on the feature map it makes sense that this
[00:42:41] the feature map it makes sense that this is the part that corresponds to it so
[00:42:45] is the part that corresponds to it so now the higher-level intuition is this
[00:42:48] now the higher-level intuition is this we're going to delve into it and see
[00:42:50] we're going to delve into it and see what do we mean by and pull what do we
[00:42:53] what do we mean by and pull what do we mean by unreal oh and what do we mean by
[00:42:54] mean by unreal oh and what do we mean by decock okay
[00:42:57] decock okay yes you're at the mall and whatever
[00:43:01] yes you're at the mall and whatever values they were at what we have just
[00:43:03] values they were at what we have just gone and I read the instruction of the
[00:43:05] gone and I read the instruction of the whole image so the difference I mean if
[00:43:08] whole image so the difference I mean if we don't zero out all the activations it
[00:43:10] we don't zero out all the activations it says that this through construction
[00:43:12] says that this through construction would be Messier
[00:43:13] would be Messier it would be more messy doesn't doesn't
[00:43:16] it would be more messy doesn't doesn't necessarily mean you will not get the
[00:43:18] necessarily mean you will not get the full image because probably the other
[00:43:21] full image because probably the other activations probably didn't even fire it
[00:43:23] activations probably didn't even fire it means they didn't detected anything else
[00:43:25] means they didn't detected anything else it's just that it's gonna is going to
[00:43:27] it's just that it's gonna is going to add some noise to this reconstruction
[00:43:30] add some noise to this reconstruction okay so let's talk about the convolution
[00:43:32] okay so let's talk about the convolution a little bit on the board so to start
[00:43:38] a little bit on the board so to start with the convolution and you guys can
[00:43:43] with the convolution and you guys can take notes if you want we're going to
[00:43:44] take notes if you want we're going to spend about 20 minutes on the board now
[00:43:46] spend about 20 minutes on the board now to discuss the convolution okay to
[00:43:55] to discuss the convolution okay to understand the deconvolution we first
[00:43:57] understand the deconvolution we first need to understand the convolution we've
[00:43:59] need to understand the convolution we've seen it from a computer science
[00:44:02] seen it from a computer science perspective but actually what we're
[00:44:04] perspective but actually what we're going to do here is we're going to frame
[00:44:06] going to do here is we're going to frame the convolution as a simple matrix
[00:44:09] the convolution as a simple matrix vector mathematical operation I'm going
[00:44:12] vector mathematical operation I'm going to see that it's actually possible so
[00:44:14] to see that it's actually possible so let's start with a 1d come for the 1d
[00:44:26] let's start with a 1d come for the 1d convolution I will take an input X which
[00:44:29] convolution I will take an input X which is of size 12 X 1 X 2 X 3 X 4 X 5 X 6 X
[00:44:39] is of size 12 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 so 8 plus 2 padding which gives me
[00:44:44] 7 X 8 so 8 plus 2 padding which gives me the 12 that I mentioned so the input is
[00:44:48] the 12 that I mentioned so the input is a one-dimensional vector which has
[00:44:52] a one-dimensional vector which has padding of 2 on both sides I will give
[00:44:57] padding of 2 on both sides I will give it to a layer that will be a 1d comm and
[00:45:01] it to a layer that will be a 1d comm and this layer will have only one filter and
[00:45:05] this layer will have only one filter and the filter size
[00:45:10] we'll be four we will also use a stride
[00:45:16] we'll be four we will also use a stride equal to two so my first question is
[00:45:24] equal to two so my first question is what's the size of the output can you
[00:45:27] what's the size of the output can you guys compute it on your on your notepad
[00:45:32] guys compute it on your on your notepad and and tell me what's the size of the
[00:45:35] and and tell me what's the size of the output input size twelve filter of size
[00:45:47] output input size twelve filter of size four stride of two padding of to fight
[00:45:52] four stride of two padding of to fight yeah I heard yeah so remember you use an
[00:45:55] yeah I heard yeah so remember you use an X sorry
[00:45:57] X sorry n y equals n X minus F plus two T
[00:46:03] n y equals n X minus F plus two T divided by stride and you will get five
[00:46:07] divided by stride and you will get five so what I'm gonna get is y1 y2 y3 y4 y5
[00:46:21] so I'm going to focus on this specific
[00:46:24] so I'm going to focus on this specific convolution for now and I'm going to
[00:46:26] convolution for now and I'm going to show now that we can define it as a as a
[00:46:30] show now that we can define it as a as a mathematical operation between a matrix
[00:46:32] mathematical operation between a matrix and a vector so the way to do it is I
[00:46:35] and a vector so the way to do it is I guess the easiest way is to write the
[00:46:37] guess the easiest way is to write the system of equation that is underlying
[00:46:40] system of equation that is underlying here what is y1 y1 is the filter applied
[00:46:46] here what is y1 y1 is the filter applied to the four first values here does it
[00:46:50] to the four first values here does it make sense so if I define my filter as
[00:46:54] make sense so if I define my filter as being Y W 1 W 2 W 3 and W 4 what I'm
[00:47:02] being Y W 1 W 2 W 3 and W 4 what I'm gonna get is that Y 1 equals W 1 times 0
[00:47:06] gonna get is that Y 1 equals W 1 times 0 plus W 2 times 0 plus W 3 times X 1 plus
[00:47:12] plus W 2 times 0 plus W 3 times X 1 plus W 4 times X 2 this makes sense just a
[00:47:18] W 4 times X 2 this makes sense just a convolution element-wise operation and
[00:47:21] convolution element-wise operation and then sum all of it
[00:47:25] why to is going to be same thing but we
[00:47:30] why to is going to be same thing but we just ride off to going to down so he's
[00:47:33] just ride off to going to down so he's going to give me W 1 times X 1 plus W 2
[00:47:37] going to give me W 1 times X 1 plus W 2 times X 2 plus W 3 times X 3 plus W 4
[00:47:43] times X 2 plus W 3 times X 3 plus W 4 times X 4 correct everybody's following
[00:47:49] times X 4 correct everybody's following know same thing we will do it for all
[00:47:53] know same thing we will do it for all the wise until Y 5 and we know that Y 5
[00:47:56] the wise until Y 5 and we know that Y 5 is element wise operation between the
[00:47:58] is element wise operation between the filter and the 4 last number here
[00:48:00] filter and the 4 last number here summing them so it will give me W 1
[00:48:05] summing them so it will give me W 1 times X 7 plus W 2 times X 8 plus 0 plus
[00:48:12] times X 7 plus W 2 times X 8 plus 0 plus W 3 times 0 plus W 4 times 0 ok now what
[00:48:28] W 3 times 0 plus W 4 times 0 ok now what we're going to do is to try to write
[00:48:30] we're going to do is to try to write down Y as a matrix vector operation
[00:48:35] down Y as a matrix vector operation between W and X we need to find what
[00:48:38] between W and X we need to find what this W matrix is and looking at the
[00:48:42] this W matrix is and looking at the system of equation it seems that it's
[00:48:44] system of equation it seems that it's not impossible so let's try to do it I
[00:48:47] not impossible so let's try to do it I will write my Y vector here y 1 y 2 y 3
[00:48:51] will write my Y vector here y 1 y 2 y 3 y 4 y 5 and I will write my matrix here
[00:49:00] y 4 y 5 and I will write my matrix here and my vector X here so first question
[00:49:06] and my vector X here so first question is what do you think will be the shape
[00:49:08] is what do you think will be the shape of this W matrix
[00:49:17] five by twelve correctly we know that
[00:49:20] five by twelve correctly we know that this is five by one this is 12 by one so
[00:49:24] this is five by one this is 12 by one so of course W is going to be 5 by 12 right
[00:49:29] of course W is going to be 5 by 12 right so now let's try to fill it in 0 0 X 1 X
[00:49:34] so now let's try to fill it in 0 0 X 1 X 2 X 3 blah blah blah x800 can you guys
[00:49:42] 2 X 3 blah blah blah x800 can you guys see in the background oh yeah ok cool so
[00:49:46] see in the background oh yeah ok cool so I'm going to fill in this matrix
[00:49:47] I'm going to fill in this matrix regarding this system of equation I know
[00:49:50] regarding this system of equation I know that the y1 would be W 1 times 0 W 2
[00:49:54] that the y1 would be W 1 times 0 W 2 times 0 W 3 time X 1 w 4 times X so this
[00:49:57] times 0 W 3 time X 1 w 4 times X so this vector is going to multiply the first
[00:50:01] vector is going to multiply the first row here so I just have to place my w's
[00:50:04] row here so I just have to place my w's here W 1 will come here multiply 0 w 2
[00:50:08] here W 1 will come here multiply 0 w 2 will come here W 3 would come here and W
[00:50:11] will come here W 3 would come here and W 4 would come here and all the rest would
[00:50:13] 4 would come here and all the rest would be filled in with zeros right I don't
[00:50:15] be filled in with zeros right I don't want any more multiplications how about
[00:50:19] want any more multiplications how about the second row of this matrix I know
[00:50:21] the second row of this matrix I know that Y 2 has to be equal to this dot
[00:50:24] that Y 2 has to be equal to this dot product with this rope and I know that
[00:50:26] product with this rope and I know that it's going to give me W 1 X 1 plus W 2 X
[00:50:29] it's going to give me W 1 X 1 plus W 2 X 2 plus W 3 X 3 X 1 is the third input on
[00:50:33] 2 plus W 3 X 3 X 1 is the third input on this vector third third entry so I would
[00:50:37] this vector third third entry so I would need to shift what I had in the previous
[00:50:39] need to shift what I had in the previous row with the stride of 2 it will give me
[00:50:42] row with the stride of 2 it will give me that does it make sense so if I use the
[00:50:50] that does it make sense so if I use the dot product of this row with that I
[00:50:52] dot product of this row with that I should get the second equation up there
[00:50:55] should get the second equation up there and so on and you understand what
[00:50:57] and so on and you understand what happens right this pattern we just shift
[00:51:00] happens right this pattern we just shift we just ride off to on the side so I
[00:51:03] we just ride off to on the side so I would get zeros here and I would get my
[00:51:06] would get zeros here and I would get my W 1 W 2 W 3 w 4 and then zeros and all
[00:51:11] W 1 W 2 W 3 w 4 and then zeros and all the way down here and all the way down
[00:51:14] the way down here and all the way down here what I we get is w 4 W 3 w 2 W 1
[00:51:19] here what I we get is w 4 W 3 w 2 W 1 and zeros so the only thing I want to
[00:51:25] and zeros so the only thing I want to mention here is that the convolution
[00:51:27] mention here is that the convolution operation as you see can be framed
[00:51:29] operation as you see can be framed as a simple matrix times a vector yes on
[00:51:35] as a simple matrix times a vector yes on the right side the top row should it be
[00:51:37] the right side the top row should it be the left because that's going to fight
[00:51:39] the left because that's going to fight for the top row wide the zeros are on
[00:51:42] for the top row wide the zeros are on the right side yes because I don't want
[00:51:45] the right side yes because I don't want Y hat y1 to be dependent on X 3/2 X 8 so
[00:51:52] Y hat y1 to be dependent on X 3/2 X 8 so I want this to be 0 multiplicate pliers
[00:52:00] okay so why is this important for the
[00:52:05] okay so why is this important for the intuition behind the deconvolution in
[00:52:06] intuition behind the deconvolution in the existence of the deconvolution is
[00:52:08] the existence of the deconvolution is because if we managed to write down y
[00:52:11] because if we managed to write down y equal WX we probably can write down X
[00:52:16] equal WX we probably can write down X equal W minus 1 Y if W is an invertible
[00:52:23] equal W minus 1 Y if W is an invertible matrix and this is going to to be our
[00:52:29] matrix and this is going to to be our deconvolution and in fact what's the
[00:52:31] deconvolution and in fact what's the what's the shape of this new matrix
[00:52:45] yes 12 by 5 we have 12 by one on one
[00:52:51] yes 12 by 5 we have 12 by one on one side five by one on the other it has to
[00:52:53] side five by one on the other it has to be 12 by 5 so it's flipped compared to W
[00:52:58] so one thing we're going to do here is
[00:53:01] so one thing we're going to do here is we're going to make an assumption first
[00:53:04] we're going to make an assumption first assumption is that W is an invertible
[00:53:07] assumption is that W is an invertible matrix and on top of that we're going to
[00:53:09] matrix and on top of that we're going to make a stronger assumption which is that
[00:53:14] W is an orthogonal matrix and without
[00:53:29] W is an orthogonal matrix and without going into the details here same as when
[00:53:32] going into the details here same as when we proved Xavier initialization in
[00:53:35] we proved Xavier initialization in sections we made some assumptions that
[00:53:37] sections we made some assumptions that are not always true this assumption is
[00:53:39] are not always true this assumption is not going to be always true one one
[00:53:43] not going to be always true one one intuition that you can have is if I'm
[00:53:44] intuition that you can have is if I'm using a filter that is assume the filter
[00:53:49] using a filter that is assume the filter is an edge detector so like plus 1 0 0
[00:53:54] is an edge detector so like plus 1 0 0 minus 1 in this case the matrix would be
[00:54:00] minus 1 in this case the matrix would be orthogonal why a matrix that is
[00:54:03] orthogonal why a matrix that is orthogonal means that if I take two of
[00:54:06] orthogonal means that if I take two of the columns here I dot product them
[00:54:09] the columns here I dot product them together it should give me 0 same with
[00:54:13] together it should give me 0 same with the rows you can see it so what's
[00:54:14] the rows you can see it so what's interesting is that if the stride was
[00:54:19] interesting is that if the stride was for there will be no overlap between
[00:54:22] for there will be no overlap between these two rows it would give me an
[00:54:25] these two rows it would give me an orthogonal matrix here let's try these
[00:54:27] orthogonal matrix here let's try these two but if I replace this W 1 by minus 1
[00:54:30] two but if I replace this W 1 by minus 1 0 0 plus 1 sorry plus 1 0 0 minus 1 and
[00:54:34] 0 0 plus 1 sorry plus 1 0 0 minus 1 and minus plus 1 0 0 minus 1 you can see
[00:54:36] minus plus 1 0 0 minus 1 you can see that the dot product would be 0 the
[00:54:39] that the dot product would be 0 the zeros will multiply the ones and the
[00:54:42] zeros will multiply the ones and the ones were multiplied the zeros give me a
[00:54:44] ones were multiplied the zeros give me a 0 dot prod so this is a case where it
[00:54:46] 0 dot prod so this is a case where it works practices doesn't always work the
[00:54:49] works practices doesn't always work the reason we're making this assumption is
[00:54:51] reason we're making this assumption is because we want to make a reconstruction
[00:54:53] because we want to make a reconstruction right so we want to be able to
[00:54:56] right so we want to be able to this w- one this this is invert and the
[00:55:01] this w- one this this is invert and the reconstruction is not going to be exact
[00:55:03] reconstruction is not going to be exact but at a first-order approximation we
[00:55:07] but at a first-order approximation we can assume that the reconstruction will
[00:55:09] can assume that the reconstruction will still be useful to us even if this
[00:55:11] still be useful to us even if this assumption is not always true in the
[00:55:13] assumption is not always true in the case where w is orthogonal i know that
[00:55:17] case where w is orthogonal i know that the inverter of w is w transpose or
[00:55:20] the inverter of w is w transpose or another way to write it is that for
[00:55:23] another way to write it is that for orthogonal matrices w transpose time w
[00:55:26] orthogonal matrices w transpose time w is the identity matrix so what it tells
[00:55:30] is the identity matrix so what it tells me is that X is going to be W transpose
[00:55:35] me is that X is going to be W transpose time y times y so let's see what we get
[00:55:40] time y times y so let's see what we get from that let me write down the main C
[00:55:52] from that let me write down the main C code so let's say now we have our X and
[00:56:07] code so let's say now we have our X and we want to regenerate our we will have
[00:56:11] we want to regenerate our we will have our Y and we want to generate our X
[00:56:13] our Y and we want to generate our X using this method so I would what I
[00:56:17] using this method so I would what I would write is to understand the 1dd
[00:56:21] would write is to understand the 1dd comp we can use the following
[00:56:24] comp we can use the following illustrations where we have X here which
[00:56:29] illustrations where we have X here which is 0 0 X 1 X 2 X 3 all the way down to X
[00:56:34] is 0 0 X 1 X 2 X 3 all the way down to X 8 okay and I will have my W matrix here
[00:56:44] 8 okay and I will have my W matrix here W transpose and my Y vector y1 y2 y3 y4
[00:56:51] W transpose and my Y vector y1 y2 y3 y4 and y5 here and so I know that this
[00:56:56] and y5 here and so I know that this matrix will be the transpose of the one
[00:56:58] matrix will be the transpose of the one I have here right so I can just write
[00:57:01] I have here right so I can just write down the transpose the transpose will be
[00:57:03] down the transpose the transpose will be w1 w2 w3 w-4 okay I will shifted
[00:57:10] w1 w2 w3 w-4 okay I will shifted we destroyed of two and so on and this
[00:57:32] we destroyed of two and so on and this whole thing will be w transpose so the
[00:57:41] whole thing will be w transpose so the small issue here is that this in
[00:57:44] small issue here is that this in practice is not is going to be very
[00:57:47] practice is not is going to be very similar to a convolution but because but
[00:57:52] similar to a convolution but because but it's going to be a tiny little different
[00:57:54] it's going to be a tiny little different in terms of implementation another
[00:57:56] in terms of implementation another question I might ask is how can we do
[00:57:59] question I might ask is how can we do the same thing with the same pattern as
[00:58:01] the same thing with the same pattern as we have here it means the stride is
[00:58:04] we have here it means the stride is going from left to right instead of
[00:58:06] going from left to right instead of going from up to down I'm going to
[00:58:11] going from up to down I'm going to introduce that with the technique called
[00:58:13] introduce that with the technique called sub-pixel convolution and for those of
[00:58:21] sub-pixel convolution and for those of you who read papers in segmentation in
[00:58:23] you who read papers in segmentation in visualization often time this is the
[00:58:25] visualization often time this is the type of convolution that is used for
[00:58:26] type of convolution that is used for reconstruction so let's see how it works
[00:58:28] reconstruction so let's see how it works I just want to do the same operation but
[00:58:34] I just want to do the same operation but instead of doing it we just try going
[00:58:36] instead of doing it we just try going from up to down I want to do it from a
[00:58:39] from up to down I want to do it from a strike going from left to right what one
[00:58:47] strike going from left to right what one thing you want to you want to notice
[00:58:49] thing you want to you want to notice here is that the two lines that I wrote
[00:58:54] here is that the two lines that I wrote here are cropped and the reason is
[00:59:00] here are cropped and the reason is because we're using a padded input here
[00:59:03] because we're using a padded input here we will just crop the two top lines and
[00:59:06] we will just crop the two top lines and same for the two last lines they will be
[00:59:10] same for the two last lines they will be cropped look at that w1 we multiply y
[00:59:17] cropped look at that w1 we multiply y one and this one we multiply Y two and
[00:59:21] one and this one we multiply Y two and so on so this dot product will give me
[00:59:23] so on so this dot product will give me w1 times one
[00:59:24] w1 times one but I don't want that to happen because
[00:59:26] but I don't want that to happen because I want to get to padded zero here
[00:59:28] I want to get to padded zero here so we just drop that in this matrix is
[00:59:32] so we just drop that in this matrix is actually going to be smaller than it
[00:59:34] actually going to be smaller than it seems and is going to generate my X 1
[00:59:36] seems and is going to generate my X 1 through X Y 8 and then I will pad the
[00:59:39] through X Y 8 and then I will pad the top values in the bottom values okay
[00:59:42] top values in the bottom values okay just the hack
[00:59:46] so let's look at the subpixel
[00:59:48] so let's look at the subpixel convolution I have my input and now we
[00:59:57] convolution I have my input and now we do something quite fun I would perform a
[01:00:04] do something quite fun I would perform a sub-pixel operation on Y what does it
[01:00:07] sub-pixel operation on Y what does it mean I will insert zeros almost
[01:00:10] mean I will insert zeros almost everywhere I would insert them and I
[01:00:12] everywhere I would insert them and I will get 0 0 Y 1 0 Y 2 0 Y 3 0 Y 4 0 Y 5
[01:00:21] will get 0 0 Y 1 0 Y 2 0 Y 3 0 Y 4 0 Y 5 and 0 0 even more zero here so this
[01:00:27] and 0 0 even more zero here so this vector is just the vector Y with some
[01:00:32] vector is just the vector Y with some zeros inserted around it and also in the
[01:00:35] zeros inserted around it and also in the middle between the elements of Y now why
[01:00:39] middle between the elements of Y now why is that interesting it trans resting
[01:00:41] is that interesting it trans resting because I can now write down my
[01:00:44] because I can now write down my convolution by flipping my weight so let
[01:01:08] convolution by flipping my weight so let me explain a little bit what happened
[01:01:09] me explain a little bit what happened here what we wanted is in order to be
[01:01:16] here what we wanted is in order to be able to efficiently compute the
[01:01:17] able to efficiently compute the deconvolution the same way as we've
[01:01:19] deconvolution the same way as we've learnt to compute the convolution we
[01:01:23] learnt to compute the convolution we wanted to have the weights scattered
[01:01:26] wanted to have the weights scattered from left to right with the stride
[01:01:28] from left to right with the stride moving from left to right what we did is
[01:01:31] moving from left to right what we did is that we use a sub pixel version of Y by
[01:01:34] that we use a sub pixel version of Y by inserting zeros in the middle and we
[01:01:36] inserting zeros in the middle and we divided the stride by 2
[01:01:38] divided the stride by 2 so instead of having a stride of two as
[01:01:40] so instead of having a stride of two as we had in our convolution we have a
[01:01:42] we had in our convolution we have a stride of one in our deconvolution so
[01:01:45] stride of one in our deconvolution so notice that I shift my weights from one
[01:01:48] notice that I shift my weights from one at every step when I move from one row
[01:01:51] at every step when I move from one row to another second thing is I flipped my
[01:01:54] to another second thing is I flipped my weights I flipped my weight so instead
[01:01:58] weights I flipped my weight so instead of having W 1 W 2 W 3 w 4 now I have W
[01:02:01] of having W 1 W 2 W 3 w 4 now I have W for W 3 w 2 W 1 and what you could see
[01:02:05] for W 3 w 2 W 1 and what you could see is looking at that first look at this
[01:02:11] is looking at that first look at this row the first row that is not cropped
[01:02:13] row the first row that is not cropped the result of the dot product of this
[01:02:17] the result of the dot product of this row with this vector is going to be Y 1
[01:02:21] row with this vector is going to be Y 1 times W 3 plus y 2 times W 1 yeah now
[01:02:31] times W 3 plus y 2 times W 1 yeah now let's look what happened here I look at
[01:02:34] let's look what happened here I look at my first row here the dot product of
[01:02:37] my first row here the dot product of this first room with my Y here is going
[01:02:42] this first room with my Y here is going to be a sorry sorry we these two are
[01:02:45] to be a sorry sorry we these two are cropped is what in same here so looking
[01:02:53] cropped is what in same here so looking at my first non cropped row here as a
[01:02:57] at my first non cropped row here as a dot product with this vector what I get
[01:03:01] dot product with this vector what I get is w 3 times y 1 plus W 2 sorry plus W 1
[01:03:09] is w 3 times y 1 plus W 2 sorry plus W 1 times y 2 so exactly the same thing as I
[01:03:12] times y 2 so exactly the same thing as I got there so these two operations are
[01:03:15] got there so these two operations are exactly the same operations they're the
[01:03:17] exactly the same operations they're the same thing you get the same results to a
[01:03:20] same thing you get the same results to a different way of doing it one is using a
[01:03:23] different way of doing it one is using a weird operation with strides going from
[01:03:26] weird operation with strides going from top to bottom and the second one is
[01:03:28] top to bottom and the second one is exactly a convolution
[01:03:29] exactly a convolution these are convolution convolution plus
[01:03:32] these are convolution convolution plus flipped weights insertion of zeros for
[01:03:35] flipped weights insertion of zeros for the subpixel version of Y
[01:03:41] and on top of that padding here and
[01:03:44] and on top of that padding here and there so this was the hardest part okay
[01:03:50] there so this was the hardest part okay does it give more intuition on the
[01:03:53] does it give more intuition on the convolution here you know now how
[01:03:56] convolution here you know now how convolution can be framed as a
[01:03:57] convolution can be framed as a mathematical operation between a matrix
[01:03:59] mathematical operation between a matrix and a vector and you know also that
[01:04:01] and a vector and you know also that under these assumptions the way we will
[01:04:04] under these assumptions the way we will deconvolve is just by flipping our
[01:04:07] deconvolve is just by flipping our weights dividing the stride by two and
[01:04:11] weights dividing the stride by two and inserting zeros if we just do that we're
[01:04:14] inserting zeros if we just do that we're deconvolve Inc four propagates in a
[01:04:18] deconvolve Inc four propagates in a convolution the following way
[01:04:20] convolution the following way you want to deconvolve just flip all the
[01:04:23] you want to deconvolve just flip all the weights insert zeros sub-pixel and
[01:04:27] weights insert zeros sub-pixel and finally divide the stride and that's the
[01:04:30] finally divide the stride and that's the deconvolution it's a super complex thing
[01:04:33] deconvolution it's a super complex thing to understand but this is the intuition
[01:04:35] to understand but this is the intuition behind it now let's try to have an
[01:04:38] behind it now let's try to have an intuition of how it would work in two
[01:04:40] intuition of how it would work in two dimension let me write it down why do we
[01:04:54] dimension let me write it down why do we use that because in terms of
[01:04:56] use that because in terms of implementation this is the same as what
[01:04:58] implementation this is the same as what we've been using here is very similar
[01:05:01] we've been using here is very similar while this one is another implementation
[01:05:04] while this one is another implementation so you could do both the same is the
[01:05:06] so you could do both the same is the same operation but in practice this one
[01:05:09] same operation but in practice this one is easier to understand because it's
[01:05:11] is easier to understand because it's exactly the same operation of the
[01:05:12] exactly the same operation of the convolution with flipped weights
[01:05:14] convolution with flipped weights insertion of zeros and divide it right
[01:05:17] insertion of zeros and divide it right that's why I wanted to show that when
[01:05:24] that's why I wanted to show that when yes assumption doesn't hold
[01:05:25] yes assumption doesn't hold yeah so oftentimes the assumption
[01:05:27] yeah so oftentimes the assumption doesn't hold but what we want is to be
[01:05:29] doesn't hold but what we want is to be able to sear a construction and if we
[01:05:31] able to sear a construction and if we use this method we will still see our
[01:05:32] use this method we will still see our construction practice if we had really W
[01:05:37] construction practice if we had really W minus one the reconstruction would be
[01:05:39] minus one the reconstruction would be much better but we don't so let me go
[01:05:43] much better but we don't so let me go over to to the the 2d example we're
[01:05:46] over to to the the 2d example we're going to go a little over time because
[01:05:47] going to go a little over time because we have two hours technically for one
[01:05:49] we have two hours technically for one hour and 50 minutes and and let me go
[01:05:52] hour and 50 minutes and and let me go over the 2d X
[01:05:54] over the 2d X and then we will answer this question on
[01:05:57] and then we will answer this question on why we need to make this assumption so
[01:06:06] why we need to make this assumption so here is the interpretation of the 2d
[01:06:08] here is the interpretation of the 2d deconvolution let me write it down here
[01:06:17] the intuition behind the 2d become is I
[01:06:21] the intuition behind the 2d become is I get my input which is 5 by 5 and this I
[01:06:27] get my input which is 5 by 5 and this I call it x i4 propagate it's using a
[01:06:31] call it x i4 propagate it's using a filter of size 2 by 2 in a conflate and
[01:06:35] filter of size 2 by 2 in a conflate and astride of - this is my convolution what
[01:06:40] astride of - this is my convolution what I get so if you do 5 minus 2 plus the
[01:06:46] I get so if you do 5 minus 2 plus the padding which is 0 divided by 2 plus 1
[01:06:50] padding which is 0 divided by 2 plus 1 oh I forgot the plus 1 here plus 1 and
[01:06:53] oh I forgot the plus 1 here plus 1 and you floor it so so 5 minus 2 divided by
[01:06:59] you floor it so so 5 minus 2 divided by 2 gives you 3 divided by 2 plus 1
[01:07:04] 2 gives you 3 divided by 2 plus 1 no actually it will give you 3 by 3 yeah
[01:07:08] no actually it will give you 3 by 3 yeah 3 by 3 a Y of 3 by 3 that's what you get
[01:07:11] 3 by 3 a Y of 3 by 3 that's what you get and now this you call it Y what you're
[01:07:18] and now this you call it Y what you're going to do here is you're going to
[01:07:19] going to do here is you're going to deconvolve Y in order to deconvolve Y in
[01:07:25] deconvolve Y in order to deconvolve Y in order to deconvolve it you're going to
[01:07:27] order to deconvolve it you're going to use a stride of 1 and what we said is
[01:07:31] use a stride of 1 and what we said is that we need to divide the stride by 2
[01:07:33] that we need to divide the stride by 2 right so we need astride of 1 and the
[01:07:37] right so we need astride of 1 and the filter will be the same two by two and
[01:07:40] filter will be the same two by two and you remember that what we've seen is
[01:07:42] you remember that what we've seen is that the feature is the same it's just
[01:07:44] that the feature is the same it's just that is going to be flipped so you will
[01:07:47] that is going to be flipped so you will use a filter of Dubai to but flip
[01:07:54] and now what do we get we hope to get a
[01:07:58] and now what do we get we hope to get a five by five input which is going to be
[01:08:01] five by five input which is going to be our reconstructed X five by five input
[01:08:04] our reconstructed X five by five input and the way we're going to do it is this
[01:08:07] and the way we're going to do it is this is the intuition behind it yeah okay up
[01:08:17] is the intuition behind it yeah okay up to my - thanks yeah five by five here
[01:08:23] to my - thanks yeah five by five here that's what we hope to reconstruct the
[01:08:26] that's what we hope to reconstruct the way we will do it is we will take the
[01:08:27] way we will do it is we will take the filter s is two by two we will put it
[01:08:31] filter s is two by two we will put it here and we will multiply all the
[01:08:37] here and we will multiply all the weights of this filter by y1 1 all the
[01:08:42] weights of this filter by y1 1 all the weights will be multiplied by y 1 1 so
[01:08:46] weights will be multiplied by y 1 1 so we get four values here which are going
[01:08:48] we get four values here which are going to be W 4 y 1 1 W 3 y 1 1 and so on now
[01:08:54] to be W 4 y 1 1 W 3 y 1 1 and so on now I will shift this with the stride of 1
[01:08:57] I will shift this with the stride of 1 and I will put my filter again here and
[01:09:00] and I will put my filter again here and I will multiply all the entries by Y 1 2
[01:09:04] I will multiply all the entries by Y 1 2 and so on and you see that this entry
[01:09:10] and so on and you see that this entry has an overlap so it will it will it
[01:09:12] has an overlap so it will it will it will be updated at every step of the
[01:09:14] will be updated at every step of the convolution it's not like what happened
[01:09:16] convolution it's not like what happened in the fourth pass so this is the
[01:09:19] in the fourth pass so this is the intuition behind the two deconvolution
[01:09:21] intuition behind the two deconvolution 3d same thing you have a volume here so
[01:09:27] 3d same thing you have a volume here so your filter is going to be a volume what
[01:09:30] your filter is going to be a volume what you're going to do is you're going to
[01:09:32] you're going to do is you're going to put the volume here x 1 1 1 and so on
[01:09:37] put the volume here x 1 1 1 and so on and then if you have a second filter you
[01:09:39] and then if you have a second filter you would put it again on top of it and
[01:09:40] would put it again on top of it and multiply by 1 1 1 all the weights of the
[01:09:43] multiply by 1 1 1 all the weights of the filter and so on it's a little
[01:09:45] filter and so on it's a little complicated but this is the intuition
[01:09:47] complicated but this is the intuition behind the convolution ok let's get back
[01:09:50] behind the convolution ok let's get back to the lecture I'm going to take one
[01:09:52] to the lecture I'm going to take one question here if you guys need
[01:09:53] question here if you guys need clarification
[01:09:59] no worries you don't understand the
[01:10:01] no worries you don't understand the convolution fully is the important part
[01:10:03] convolution fully is the important part is that you get the intuition here and
[01:10:04] is that you get the intuition here and you understand how we use it so let me
[01:10:07] you understand how we use it so let me make a comment why do we need to make
[01:10:10] make a comment why do we need to make this assumption and do we need to make
[01:10:12] this assumption and do we need to make when we want to reconstruct like we're
[01:10:15] when we want to reconstruct like we're doing here in the visualization we need
[01:10:17] doing here in the visualization we need to make this assumption because we don't
[01:10:20] to make this assumption because we don't want to retrain waits for the D
[01:10:21] want to retrain waits for the D convolutional Network what we know is
[01:10:24] convolutional Network what we know is that the activation we selected here on
[01:10:26] that the activation we selected here on the feature map is has gone through the
[01:10:30] the feature map is has gone through the entire pipeline of the confidence so to
[01:10:32] entire pipeline of the confidence so to reconstruct we need to use the weights
[01:10:34] reconstruct we need to use the weights that we already have in the confidence
[01:10:36] that we already have in the confidence we need to pass them to the
[01:10:37] we need to pass them to the deconvolution and reconstruct if we're
[01:10:40] deconvolution and reconstruct if we're doing the segmentation like we talked
[01:10:42] doing the segmentation like we talked about for the lifecell we don't need to
[01:10:46] about for the lifecell we don't need to do this assumption we're just saying
[01:10:47] do this assumption we're just saying that this is a procedure that is the D
[01:10:50] that this is a procedure that is the D convolution and we will train the
[01:10:52] convolution and we will train the weights of the deconvolution so there is
[01:10:55] weights of the deconvolution so there is no need to make this assumption it's
[01:10:56] no need to make this assumption it's just we have a technique that is
[01:10:57] just we have a technique that is dividing this right by one and inserting
[01:11:00] dividing this right by one and inserting zeroes and then beam we will train the
[01:11:02] zeroes and then beam we will train the weights and we get an output that is an
[01:11:05] weights and we get an output that is an AB sampled version of the input that was
[01:11:07] AB sampled version of the input that was given to it so there's two use case one
[01:11:10] given to it so there's two use case one where you use the weights and one where
[01:11:12] where you use the weights and one where you don't in this case we don't want to
[01:11:13] you don't in this case we don't want to retrain we want to use the weights so
[01:11:16] retrain we want to use the weights so let's see let's see a version more
[01:11:18] let's see let's see a version more visual of the up sampling so we do the
[01:11:22] visual of the up sampling so we do the subpixel image this is my image 4x4 i
[01:11:25] subpixel image this is my image 4x4 i insert zeros and I pad it I get a nine
[01:11:27] insert zeros and I pad it I get a nine by nine image I have my filter like that
[01:11:31] by nine image I have my filter like that and this filter will convolve I will it
[01:11:35] and this filter will convolve I will it would convolve over the input so I would
[01:11:36] would convolve over the input so I would place it on my input and at every step I
[01:11:39] place it on my input and at every step I would perform a convolution up I will
[01:11:41] would perform a convolution up I will get a value here the value is blue
[01:11:43] get a value here the value is blue because as you can see the weights that
[01:11:44] because as you can see the weights that affected the output were only the blue
[01:11:47] affected the output were only the blue weights I would use a stride of one beam
[01:11:51] weights I would use a stride of one beam now the weights that affect my input are
[01:11:53] now the weights that affect my input are the green ones and so on and I would
[01:11:56] the green ones and so on and I would just come valve as I do usually and so
[01:12:02] just come valve as I do usually and so on and now one step down I see that the
[01:12:05] on and now one step down I see that the weights that are impacting my input are
[01:12:07] weights that are impacting my input are the purple ones so I would put a purple
[01:12:10] the purple ones so I would put a purple here and so on so I just do the
[01:12:12] here and so on so I just do the convolution like that and so so one
[01:12:17] convolution like that and so so one thing that is interesting here is that
[01:12:19] thing that is interesting here is that the values that are blue in my out 6x6
[01:12:22] the values that are blue in my out 6x6 output were generated only using the
[01:12:26] output were generated only using the blue values of the filter the blue
[01:12:28] blue values of the filter the blue weights in the filter the ones that are
[01:12:32] weights in the filter the ones that are green were only used you were only
[01:12:34] green were only used you were only generated using the green values of my
[01:12:36] generated using the green values of my filter so actually this subsample
[01:12:38] filter so actually this subsample sub-pixel
[01:12:39] sub-pixel convolution or deconvolution could have
[01:12:42] convolution or deconvolution could have been done with for convolutions with the
[01:12:46] been done with for convolutions with the blue weights green weights purple white
[01:12:49] blue weights green weights purple white sand yellow weights and then just just
[01:12:52] sand yellow weights and then just just replaced such that the adjustment would
[01:12:56] replaced such that the adjustment would be the output
[01:12:57] be the output just put the output of each of these
[01:13:00] just put the output of each of these comp and mix them to give out a 6x6
[01:13:03] comp and mix them to give out a 6x6 output only thing you need to know we
[01:13:05] output only thing you need to know we have an input 4x4 and we get an output
[01:13:07] have an input 4x4 and we get an output 6x6 that's what we wanted we wanted to
[01:13:09] 6x6 that's what we wanted we wanted to of sample the image we can retrain the
[01:13:11] of sample the image we can retrain the weights or use the transport version of
[01:13:13] weights or use the transport version of them so let's see what happens now we
[01:13:15] them so let's see what happens now we understood what what the curve was doing
[01:13:18] understood what what the curve was doing so we're able to decomp what we need to
[01:13:20] so we're able to decomp what we need to do is also to ampoule and to unreal ooh
[01:13:24] do is also to ampoule and to unreal ooh fortunately it's easier than the decomp
[01:13:26] fortunately it's easier than the decomp so we're not going to do board work
[01:13:27] so we're not going to do board work anymore so let's see how uncool works if
[01:13:31] anymore so let's see how uncool works if I give you this input to the pool link
[01:13:34] I give you this input to the pool link to a max pooling layer the output is
[01:13:37] to a max pooling layer the output is obviously going to be this one
[01:13:38] obviously going to be this one 42 is the maximum of these four numbers
[01:13:41] 42 is the maximum of these four numbers assuming we're using a two-by-two filter
[01:13:43] assuming we're using a two-by-two filter with right of two vertically and
[01:13:45] with right of two vertically and horizontally 12 is the maximum of the
[01:13:48] horizontally 12 is the maximum of the green numbers six is the maximum of the
[01:13:50] green numbers six is the maximum of the red numbers and seven the orange ones
[01:13:52] red numbers and seven the orange ones now question I give you back the output
[01:13:56] now question I give you back the output and I tell you give me the input can you
[01:14:01] and I tell you give me the input can you give me the input or no no what why
[01:14:07] give me the input or no no what why you need you need you only keep the
[01:14:09] you need you need you only keep the maximum so you you lost all the other
[01:14:12] maximum so you you lost all the other numbers I don't know anymore the 0 1 and
[01:14:15] numbers I don't know anymore the 0 1 and minus 1 that's where the red numbers
[01:14:16] minus 1 that's where the red numbers here because they didn't pass through
[01:14:19] here because they didn't pass through the maximum so max pool is not
[01:14:22] the maximum so max pool is not invertible from mathematical perspective
[01:14:25] invertible from mathematical perspective what we can do is approximate its invert
[01:14:28] what we can do is approximate its invert how can we do that spread it out that's
[01:14:34] how can we do that spread it out that's a good point we could spread out the the
[01:14:37] a good point we could spread out the the 6 among the 4 values that would be an
[01:14:40] 6 among the 4 values that would be an approximation a better way if we managed
[01:14:44] approximation a better way if we managed to catch some values is to catch
[01:14:45] to catch some values is to catch something we call the switches we catch
[01:14:48] something we call the switches we catch the values of the maximum using a matrix
[01:14:51] the values of the maximum using a matrix that is very easy to store of zeros and
[01:14:54] that is very easy to store of zeros and ones and we pass it to the unpooled and
[01:14:57] ones and we pass it to the unpooled and now we can approximate the inverse
[01:14:59] now we can approximate the inverse because we know where 6 was we know
[01:15:02] because we know where 6 was we know where 12 was we know where 40 2007 was
[01:15:05] where 12 was we know where 40 2007 was but it's still not invertible because we
[01:15:08] but it's still not invertible because we lost all the other numbers think about
[01:15:12] lost all the other numbers think about max pool back propagation it's exactly
[01:15:14] max pool back propagation it's exactly the same thing these numbers 0 1 minus 1
[01:15:18] the same thing these numbers 0 1 minus 1 they had no impact in the loss function
[01:15:19] they had no impact in the loss function at the end because they didn't pass
[01:15:22] at the end because they didn't pass through the for propagation so actually
[01:15:24] through the for propagation so actually with the switches you can have the exact
[01:15:26] with the switches you can have the exact back propagation well you know that the
[01:15:28] back propagation well you know that the other values are going to be zeros
[01:15:29] other values are going to be zeros because they didn't affected the loss
[01:15:31] because they didn't affected the loss during the forward propagation but that
[01:15:34] during the forward propagation but that make sense okay so this is max pooling
[01:15:37] make sense okay so this is max pooling and pooling and max pooling and we can
[01:15:40] and pooling and max pooling and we can use it with the switches you can
[01:15:42] use it with the switches you can approximately yeah why don't we just
[01:15:47] approximately yeah why don't we just catch the whole origination quickly
[01:15:49] catch the whole origination quickly could catch the entire thing but in
[01:15:51] could catch the entire thing but in terms of back for back propagation in
[01:15:53] terms of back for back propagation in terms of efficiency we would just use
[01:15:55] terms of efficiency we would just use the switches because it's enough for on
[01:15:58] the switches because it's enough for on pulling you're right we could catch
[01:15:59] pulling you're right we could catch everything but then it's cheating like
[01:16:01] everything but then it's cheating like you you kept it so you just give it back
[01:16:03] you you kept it so you just give it back yep ok so now we know how I'm pulling
[01:16:07] yep ok so now we know how I'm pulling works let's look at the relevant so what
[01:16:11] works let's look at the relevant so what we need to do in fact is to pass the
[01:16:13] we need to do in fact is to pass the switches and the filters back to the end
[01:16:15] switches and the filters back to the end to Lindy count in order to reconstruct
[01:16:16] to Lindy count in order to reconstruct switches are the matrix of zeros and
[01:16:18] switches are the matrix of zeros and ones indicating where the maximum
[01:16:20] ones indicating where the maximum were and filters are the filters that I
[01:16:23] were and filters are the filters that I will transpose under this assumption on
[01:16:26] will transpose under this assumption on the board okay and so on and so on and I
[01:16:30] the board okay and so on and so on and I get my reconstruction I just need to
[01:16:32] get my reconstruction I just need to explain the rail you now I give you this
[01:16:36] explain the rail you now I give you this input to relu and I forward propagate it
[01:16:38] input to relu and I forward propagate it what do we get all the negative numbers
[01:16:41] what do we get all the negative numbers are going to be equalized to 0 and the
[01:16:44] are going to be equalized to 0 and the others are going to be kept now let's
[01:16:47] others are going to be kept now let's say I'm doing a back propagation through
[01:16:49] say I'm doing a back propagation through Lu what do I get if I give you that this
[01:16:52] Lu what do I get if I give you that this is the gradients that are coming back
[01:16:53] is the gradients that are coming back and I'm asking you what are the
[01:16:56] and I'm asking you what are the gradients after the rally during the
[01:16:58] gradients after the rally during the back propagation how does the Rayleigh
[01:17:01] back propagation how does the Rayleigh behave in backdrop
[01:17:07] Zero's which ones are zeros the
[01:17:13] Zero's which ones are zeros the negatives are zeros do you agree the
[01:17:17] negatives are zeros do you agree the negatives in this yellow matrix are
[01:17:19] negatives in this yellow matrix are going to be zeros during the backdrop I
[01:17:22] going to be zeros during the backdrop I guess sure think always about what was
[01:17:30] guess sure think always about what was the influence of the input on the last
[01:17:32] the influence of the input on the last function and you will find out what was
[01:17:35] function and you will find out what was the backpropagation look at this number
[01:17:40] the backpropagation look at this number this number here - - did this number
[01:17:44] this number here - - did this number have the fact that it was - - did it
[01:17:46] have the fact that it was - - did it have any influence on the last function
[01:17:48] have any influence on the last function no it could have been -10 it could have
[01:17:51] no it could have been -10 it could have been -20 it's not going to impact the
[01:17:53] been -20 it's not going to impact the last function so what do you think
[01:17:55] last function so what do you think should be the number here zero even if
[01:17:59] should be the number here zero even if the number that is coming back the
[01:18:01] the number that is coming back the gradient is 10 so what do you think
[01:18:05] gradient is 10 so what do you think should be the value backward output
[01:18:16] same idea is Mac spending what we need
[01:18:21] same idea is Mac spending what we need to do is to remember the switches
[01:18:23] to do is to remember the switches remember which of these values had an
[01:18:25] remember which of these values had an impact on the loss we passed the
[01:18:28] impact on the loss we passed the switches all these values here that are
[01:18:31] switches all these values here that are kind of a y-you know this is a why all
[01:18:34] kind of a y-you know this is a why all these ones had no impact on the last
[01:18:36] these ones had no impact on the last function so when you back from a gate
[01:18:38] function so when you back from a gate their gradient should be set to zero
[01:18:40] their gradient should be set to zero it doesn't matter to update them it's
[01:18:41] it doesn't matter to update them it's not gonna make the loss go down so these
[01:18:44] not gonna make the loss go down so these are all zeros and the rest they just
[01:18:46] are all zeros and the rest they just pass why do they pass with the same
[01:18:49] pass why do they pass with the same value because relu for positive numbers
[01:18:51] value because relu for positive numbers was one so this number one here that
[01:18:54] was one so this number one here that passed the rally during the for
[01:18:55] passed the rally during the for propagation it was not modified its
[01:18:57] propagation it was not modified its gradient is going to be one that makes
[01:19:01] gradient is going to be one that makes sense so this is really backward now in
[01:19:04] sense so this is really backward now in this reconstruction method we're not
[01:19:06] this reconstruction method we're not going to use rayleigh back part we're
[01:19:08] going to use rayleigh back part we're going to use something we call value D
[01:19:10] going to use something we call value D confident let's say the reason we're not
[01:19:12] confident let's say the reason we're not the intuition between why we're not
[01:19:14] the intuition between why we're not using value backward is because what
[01:19:16] using value backward is because what we're interested in is to know which
[01:19:18] we're interested in is to know which pixels of the input positively affected
[01:19:21] pixels of the input positively affected the activation that we're talking up so
[01:19:25] the activation that we're talking up so what we're going to do is that we're
[01:19:26] what we're going to do is that we're just going to do a rail you we're just
[01:19:28] just going to do a rail you we're just going to do a rally backward another
[01:19:30] going to do a rally backward another reason is when we reconstruct we want to
[01:19:33] reason is when we reconstruct we want to have the minimum influence from the
[01:19:35] have the minimum influence from the forward propagation because we don't
[01:19:37] forward propagation because we don't really want our reconstruction to depend
[01:19:39] really want our reconstruction to depend on the forward propagation we would like
[01:19:41] on the forward propagation we would like our reconstruction to be unbiased and
[01:19:42] our reconstruction to be unbiased and just look at this activation reconstruct
[01:19:44] just look at this activation reconstruct what happened so that's what you're
[01:19:47] what happened so that's what you're going to use again this is a hack that
[01:19:50] going to use again this is a hack that has been found through trial and error
[01:19:52] has been found through trial and error and it's not going to be scientifically
[01:19:56] and it's not going to be scientifically viable all the time okay so now we can
[01:20:00] viable all the time okay so now we can do everything and we can reconstruct and
[01:20:02] do everything and we can reconstruct and find out what was this activation
[01:20:04] find out what was this activation corresponds to it took time to
[01:20:06] corresponds to it took time to understand it but it's super fast to do
[01:20:08] understand it but it's super fast to do now just one path not iterative we could
[01:20:11] now just one path not iterative we could do it with every layer so let's say we
[01:20:13] do it with every layer so let's say we do it with the first block of conv rail
[01:20:16] do it with the first block of conv rail you max pool I go here I choose an
[01:20:18] you max pool I go here I choose an activation I find the maximum activation
[01:20:21] activation I find the maximum activation I set all the others to zero I
[01:20:23] I set all the others to zero I unpolluted I come and I find out the
[01:20:25] unpolluted I come and I find out the reconstruction this
[01:20:27] reconstruction this via activation was looking at edges like
[01:20:29] via activation was looking at edges like that so let's delve into the phone and
[01:20:33] that so let's delve into the phone and see how we can visualize inside what's
[01:20:37] see how we can visualize inside what's happening inside the network so all the
[01:20:39] happening inside the network so all the visualization we're going to see now can
[01:20:40] visualization we're going to see now can be found in Matthews dealers and Rob
[01:20:42] be found in Matthews dealers and Rob fair uses paper visualizing
[01:20:44] fair uses paper visualizing understanding convolutional networks I'm
[01:20:46] understanding convolutional networks I'm going to explain what they correspond to
[01:20:48] going to explain what they correspond to but check check out their papers if you
[01:20:50] but check check out their papers if you want to understand more into details so
[01:20:53] want to understand more into details so what happens here is that on the top
[01:20:56] what happens here is that on the top left you have nine pictures these are
[01:20:59] left you have nine pictures these are the crop pictures of the data set that
[01:21:02] the crop pictures of the data set that activated the first filter of the first
[01:21:04] activated the first filter of the first layer maximum so we have a first filter
[01:21:09] layer maximum so we have a first filter on the first layer and we run all the
[01:21:11] on the first layer and we run all the data sets and we recorded what are the
[01:21:14] data sets and we recorded what are the main pictures that activate this filter
[01:21:16] main pictures that activate this filter these were the main ones and we did the
[01:21:18] these were the main ones and we did the same thing for all the filters of the
[01:21:21] same thing for all the filters of the first layer and there are nine times
[01:21:23] first layer and there are nine times nine of them there are a lot of them I
[01:21:24] nine of them there are a lot of them I think in the bottom here you have the
[01:21:29] think in the bottom here you have the filters which are the weights that were
[01:21:32] filters which are the weights that were plotted just take the filter plot the
[01:21:35] plotted just take the filter plot the weights this is doing this is important
[01:21:37] weights this is doing this is important only for the first layer when you go
[01:21:39] only for the first layer when you go deeper in your network the filter itself
[01:21:41] deeper in your network the filter itself cannot be interpreted it's super hard to
[01:21:43] cannot be interpreted it's super hard to understand it here because the weights
[01:21:45] understand it here because the weights are directly multiplying the pixels the
[01:21:48] are directly multiplying the pixels the first layer weights can be interpretable
[01:21:50] first layer weights can be interpretable and in fact you see that the let's look
[01:21:54] and in fact you see that the let's look at the third one the third filter here
[01:21:56] at the third one the third filter here on the first row the third filter has
[01:21:58] on the first row the third filter has weights that are kind of diagonal like
[01:22:01] weights that are kind of diagonal like one of the diagonals and in fact if you
[01:22:04] one of the diagonals and in fact if you look at the data that maximized these
[01:22:07] look at the data that maximized these filters activation the feature map
[01:22:10] filters activation the feature map corresponding to this filter they're all
[01:22:11] corresponding to this filter they're all like cropped images that correspond to
[01:22:14] like cropped images that correspond to diagonals that's what happens now the
[01:22:17] diagonals that's what happens now the deeper we go the more fun we have so
[01:22:19] deeper we go the more fun we have so let's go results on a validation set of
[01:22:22] let's go results on a validation set of 50,000 images what's happened here is
[01:22:25] 50,000 images what's happened here is they took 50,000 images
[01:22:27] they took 50,000 images therefore propagated to the network they
[01:22:29] therefore propagated to the network they recorded which image is the maximum the
[01:22:33] recorded which image is the maximum the one that maximized the activation of the
[01:22:36] one that maximized the activation of the feature map corresponding to the first
[01:22:38] feature map corresponding to the first filter of layer two second
[01:22:40] filter of layer two second filter and so on for all the filters
[01:22:42] filter and so on for all the filters let's look at one of them we can see
[01:22:45] let's look at one of them we can see that okay we have a circle on this one
[01:22:47] that okay we have a circle on this one it means that this the filter general
[01:22:50] it means that this the filter general which generated the feature map
[01:22:51] which generated the feature map corresponding to this has been activated
[01:22:56] corresponding to this has been activated through probably a wheel or something
[01:22:57] through probably a wheel or something like that so the image of the wheel was
[01:23:00] like that so the image of the wheel was the one that maximizes the activation of
[01:23:02] the one that maximizes the activation of this one and then we use the Dickens
[01:23:04] this one and then we use the Dickens method to reconstruct it any questions
[01:23:07] method to reconstruct it any questions on that yeah good question what if the
[01:23:16] on that yeah good question what if the activation function is not relevant in
[01:23:18] activation function is not relevant in practice you would just use a backward
[01:23:20] practice you would just use a backward to reconstruct if it's damaged you would
[01:23:23] to reconstruct if it's damaged you would use the same the same type of method and
[01:23:25] use the same the same type of method and you will try to approximate the
[01:23:26] you will try to approximate the reconstruction okay let's go a little
[01:23:31] reconstruction okay let's go a little deeper so now same layer two four
[01:23:34] deeper so now same layer two four propagate all the images of the dataset
[01:23:36] propagate all the images of the dataset find the nine images that are the
[01:23:38] find the nine images that are the maximum activate that lead to the
[01:23:40] maximum activate that lead to the maximum activation of the first filter
[01:23:41] maximum activation of the first filter these are plotted on top here what you
[01:23:45] these are plotted on top here what you can see is like for this filter that is
[01:23:47] can see is like for this filter that is the sixth row first filter features are
[01:23:51] the sixth row first filter features are more environment to small changes so
[01:23:52] more environment to small changes so this filter actually was activated too
[01:23:55] this filter actually was activated too many different types of circles spirals
[01:23:57] many different types of circles spirals wheels and so it's it's still activated
[01:24:00] wheels and so it's it's still activated although the circles were different size
[01:24:04] although the circles were different size can go even deeper up third layer what's
[01:24:08] can go even deeper up third layer what's interesting is that the deeper you go
[01:24:09] interesting is that the deeper you go the more complexity you see so at the
[01:24:11] the more complexity you see so at the beginning we're seeing only edges now we
[01:24:13] beginning we're seeing only edges now we see much more complex figures you can
[01:24:16] see much more complex figures you can see a face here in this in this entry it
[01:24:20] see a face here in this in this entry it means that this filter activated for
[01:24:23] means that this filter activated for when it's exist when it has seen a data
[01:24:25] when it's exist when it has seen a data point that had this face then we were
[01:24:27] point that had this face then we were constructed it cropped it on the face
[01:24:28] constructed it cropped it on the face the face is kind of red it means that
[01:24:31] the face is kind of red it means that the more red it was the more activation
[01:24:34] the more red it was the more activation it led to and same top nine for layer
[01:24:38] it led to and same top nine for layer tree so these are the nine images that
[01:24:40] tree so these are the nine images that actually led to the face these are the
[01:24:42] actually led to the face these are the nine images that maximize the at the
[01:24:44] nine images that maximize the at the activation of the feature map
[01:24:47] activation of the feature map corresponding to that filter and so on
[01:24:50] corresponding to that filter and so on so here is a
[01:25:11] you dishonor
[01:25:56] normalization layers we can switch back
[01:26:00] normalization layers we can switch back and forth between showing the actual
[01:26:01] and forth between showing the actual activations and showing images
[01:26:03] activations and showing images synthesized to produce high active
[01:26:05] synthesized to produce high active easily he is giving his own image to the
[01:26:07] easily he is giving his own image to the network right now but the time we get to
[01:26:09] network right now but the time we get to the fifth convolutional layer the
[01:26:11] the fifth convolutional layer the feeders being computed represent
[01:26:13] feeders being computed represent abstract concepts so these are the green
[01:26:15] abstract concepts so these are the green that after example in Italy this neuron
[01:26:17] that after example in Italy this neuron seems to respond to phases we can
[01:26:19] seems to respond to phases we can further investigate this neuron by
[01:26:20] further investigate this neuron by showing a few different types of
[01:26:22] showing a few different types of information first we can artificially
[01:26:24] information first we can artificially create optimized images using new
[01:26:26] create optimized images using new regularization techniques initially the
[01:26:28] regularization techniques initially the one we thought the bus needs that
[01:26:29] one we thought the bus needs that simulation showing his neuron fires in
[01:26:31] simulation showing his neuron fires in response to a face and show this one is
[01:26:33] response to a face and show this one is that they also that the images are
[01:26:34] that they also that the images are training set to activate this neuron the
[01:26:36] training set to activate this neuron the most as well as pixels from those images
[01:26:38] most as well as pixels from those images most responsible for the high
[01:26:40] most responsible for the high activations computed via the
[01:26:41] activations computed via the deconvolution DC that the convolution
[01:26:43] deconvolution DC that the convolution rings feature response to multiple faces
[01:26:45] rings feature response to multiple faces in different locations and by looking at
[01:26:48] in different locations and by looking at the decon we can see that it would
[01:26:51] the decon we can see that it would respond more strongly if we had even
[01:26:52] respond more strongly if we had even darker eyes and rosy lips we can also
[01:26:55] darker eyes and rosy lips we can also confirm that it carries about the head
[01:26:56] confirm that it carries about the head and shoulders that ignores the arms and
[01:26:59] and shoulders that ignores the arms and torso we can even see that it fires to
[01:27:02] torso we can even see that it fires to some extent for cat faces using back
[01:27:05] some extent for cat faces using back prop or decon we can see that this unit
[01:27:08] prop or decon we can see that this unit depends most strongly on a couple units
[01:27:09] depends most strongly on a couple units in the previous layer contour and not
[01:27:12] in the previous layer contour and not about a dozen or so in conservation try
[01:27:14] about a dozen or so in conservation try to track by where it's look at another
[01:27:16] to track by where it's look at another marriage neural net so what is this unit
[01:27:18] marriage neural net so what is this unit doing from the top nine images we might
[01:27:21] doing from the top nine images we might conclude that it fires four different
[01:27:22] conclude that it fires four different types of clothing but examining the
[01:27:25] types of clothing but examining the synthetic images shows that it may be
[01:27:26] synthetic images shows that it may be detecting not clothing per se but
[01:27:28] detecting not clothing per se but wrinkles in the live plot we can see
[01:27:31] wrinkles in the live plot we can see that it's activated
[01:27:32] that it's activated my shirt and smoothing out half of my
[01:27:34] my shirt and smoothing out half of my shirt causes that hack with the
[01:27:36] shirt causes that hack with the activations to decrease finally here's
[01:27:40] activations to decrease finally here's another interesting neuron this one has
[01:27:43] another interesting neuron this one has learned to look for printed text in a
[01:27:44] learned to look for printed text in a variety of sizes colors and fonts this
[01:27:48] variety of sizes colors and fonts this is pretty cool
[01:27:49] is pretty cool because we never asked the network to
[01:27:51] because we never asked the network to look for wrinkles or text or faces the
[01:27:53] look for wrinkles or text or faces the only papers we provided were at the very
[01:27:55] only papers we provided were at the very last layer so the only reason the
[01:27:57] last layer so the only reason the network learned features like text and
[01:27:58] network learned features like text and faces in the middle was to support final
[01:28:00] faces in the middle was to support final decisions at that last layer for example
[01:28:03] decisions at that last layer for example the text detector may provide good
[01:28:05] the text detector may provide good evidence that a rectangle is in fact a
[01:28:08] evidence that a rectangle is in fact a book seen on edge and detecting many
[01:28:10] book seen on edge and detecting many books next to each other might be a good
[01:28:12] books next to each other might be a good way of detecting a bookcase which was
[01:28:14] way of detecting a bookcase which was one of the categories we trained the net
[01:28:15] one of the categories we trained the net to recognize in this video we've shown
[01:28:19] to recognize in this video we've shown some of the features of the deep list
[01:28:20] some of the features of the deep list toolbox and a few of the things we've
[01:28:22] toolbox and a few of the things we've learned by using it you can download it
[01:28:24] learned by using it you can download it yeah so they have a toolbox which is
[01:28:27] yeah so they have a toolbox which is exactly what you need right here and you
[01:28:29] exactly what you need right here and you could test the toolbox on your model
[01:28:32] could test the toolbox on your model takes time to get get it to run but but
[01:28:35] takes time to get get it to run but but if you want to visualize all the neurons
[01:28:37] if you want to visualize all the neurons it's very helpful okay so let's go
[01:28:41] it's very helpful okay so let's go quickly we'll spend about three minutes
[01:28:43] quickly we'll spend about three minutes on the optional deep dream one cause
[01:28:45] on the optional deep dream one cause it's fun and yeah feel free free to jump
[01:28:49] it's fun and yeah feel free free to jump in and ask questions so the Google and
[01:28:56] in and ask questions so the Google and the page the blog post is by Alexander
[01:29:00] the page the blog post is by Alexander morte Vince F the idea here is to
[01:29:02] morte Vince F the idea here is to generate art using this knowledge of
[01:29:04] generate art using this knowledge of visualization and how they do that is
[01:29:07] visualization and how they do that is quite interesting then we take an input
[01:29:10] quite interesting then we take an input for propagated to the network and I took
[01:29:14] for propagated to the network and I took specs to declare that we called the
[01:29:16] specs to declare that we called the Dreamliner then we'll take the
[01:29:19] Dreamliner then we'll take the activation and set the gradient to be
[01:29:21] activation and set the gradient to be equal to these activations the gradient
[01:29:24] equal to these activations the gradient at this layer and then back propagate
[01:29:25] at this layer and then back propagate the gradient uniqua so earlier what we
[01:29:29] the gradient uniqua so earlier what we do is that we define the new objective
[01:29:30] do is that we define the new objective function that was equal to an activation
[01:29:33] function that was equal to an activation and we try to maximize its objective
[01:29:35] and we try to maximize its objective function who they doing it even stronger
[01:29:37] function who they doing it even stronger then you take the activations and they
[01:29:40] then you take the activations and they the gradients to be equal to the
[01:29:41] the gradients to be equal to the activations and so the stronger the
[01:29:43] activations and so the stronger the activation the stronger is going to
[01:29:45] activation the stronger is going to become later on and so on and so on and
[01:29:48] become later on and so on and so on and so on so they're trying to see what the
[01:29:50] so on so they're trying to see what the network is activating for and increase
[01:29:53] network is activating for and increase even this activation so for propagate
[01:29:57] even this activation so for propagate the image set the gradient of the
[01:29:58] the image set the gradient of the dreaming layer to be code to
[01:30:00] dreaming layer to be code to exaggeration but back propagate all the
[01:30:02] exaggeration but back propagate all the way back to the inputs and update the
[01:30:04] way back to the inputs and update the pixel of the image do that several time
[01:30:06] pixel of the image do that several time and every time the activations will
[01:30:08] and every time the activations will change so you have to set again the new
[01:30:10] change so you have to set again the new activations to be the the gradients of
[01:30:13] activations to be the the gradients of the green layer and back propagate and
[01:30:15] the green layer and back propagate and also makes it you would see things
[01:30:16] also makes it you would see things happening so it's hard to see here on
[01:30:18] happening so it's hard to see here on the screen but you would have a pig
[01:30:20] the screen but you would have a pig appearing here you'd have like a tree
[01:30:23] appearing here you'd have like a tree somewhere there and some animals and a
[01:30:25] somewhere there and some animals and a lot of animals are going to start
[01:30:26] lot of animals are going to start appearing in this cloud it's interesting
[01:30:29] appearing in this cloud it's interesting because it means let's say you see this
[01:30:31] because it means let's say you see this cloud here if the network thought that
[01:30:34] cloud here if the network thought that this cloud looked a little bit like
[01:30:36] this cloud looked a little bit like regard so one of the the the feature
[01:30:39] regard so one of the the the feature maps was which would be generated by the
[01:30:42] maps was which would be generated by the filter that the textile would activate
[01:30:44] filter that the textile would activate itself a little bit because we set the
[01:30:47] itself a little bit because we set the gradient to be equal to the activation
[01:30:48] gradient to be equal to the activation is going to increase the appearance of
[01:30:52] is going to increase the appearance of the dog in the image and so on and then
[01:30:55] the dog in the image and so on and then you will see a dog appearing after a few
[01:30:56] you will see a dog appearing after a few iterations it's quite fun and if you
[01:30:59] iterations it's quite fun and if you zoom you see that type of thing so you
[01:31:01] zoom you see that type of thing so you see a big snail it's kind of a pig with
[01:31:04] see a big snail it's kind of a pig with a snail carapace camel bird dog dogfish
[01:31:08] a snail carapace camel bird dog dogfish I advise you to like look at this on the
[01:31:11] I advise you to like look at this on the slides rather than on the screen but
[01:31:13] slides rather than on the screen but it's quite fine and same like if you
[01:31:16] it's quite fine and same like if you give that type of image you would see
[01:31:18] give that type of image you would see that because the network thought there
[01:31:20] that because the network thought there was like a tower a little bit you will
[01:31:23] was like a tower a little bit you will increase the networks confidence in the
[01:31:25] increase the networks confidence in the fact that there is a tower by changing
[01:31:26] fact that there is a tower by changing the image and the tower will come out
[01:31:28] the image and the tower will come out and so on it's quite a cool yeah and if
[01:31:34] and so on it's quite a cool yeah and if you dream in lower layers obviously you
[01:31:37] you dream in lower layers obviously you will see edges happening or patterns
[01:31:39] will see edges happening or patterns coming because the lower layers seem to
[01:31:43] coming because the lower layers seem to detect an edge and then you will
[01:31:45] detect an edge and then you will increase its confidence in its edge so
[01:31:47] increase its confidence in its edge so it we between create an edge on the
[01:31:49] it we between create an edge on the image these are fine
[01:31:53] deep dream on video
[01:31:56] deep dream on video [Music]
[01:32:19] what's funny
[01:32:22] [Music]
[01:32:26] [Music] [Applause]
[01:32:32] get some trippy on the side so one one
[01:32:39] get some trippy on the side so one one inside that is fun about it is if the
[01:32:42] inside that is fun about it is if the network and this is not only for D dream
[01:32:44] network and this is not only for D dream it's also its most default gradient
[01:32:46] it's also its most default gradient assets let's say we have an output score
[01:32:48] assets let's say we have an output score of the dumbbell and we define our
[01:32:52] of the dumbbell and we define our objective function to be the dumbbell
[01:32:53] objective function to be the dumbbell score and we try to find image that
[01:32:56] score and we try to find image that maximizes the dumbbell we will see
[01:32:58] maximizes the dumbbell we will see something like that the interesting is
[01:33:00] something like that the interesting is that the network thinks that the
[01:33:02] that the network thinks that the dumbbell is a hand with a dumbbell
[01:33:05] dumbbell is a hand with a dumbbell not only the number and you can see it
[01:33:08] not only the number and you can see it here you see the hands and the reason is
[01:33:10] here you see the hands and the reason is it has never seen a dumbbell alone so
[01:33:12] it has never seen a dumbbell alone so probably image that there is no picture
[01:33:14] probably image that there is no picture of a dumbbell alone in a corner and
[01:33:16] of a dumbbell alone in a corner and leave all that samba but instead it's
[01:33:20] leave all that samba but instead it's usually a human triangle with fire
[01:33:23] usually a human triangle with fire okay so just to summarize what we've
[01:33:27] okay so just to summarize what we've learned today we are now able to answer
[01:33:30] learned today we are now able to answer all the following questions
[01:33:32] all the following questions what part of the best way to go what is
[01:33:38] what part of the best way to go what is the role of a given neuron feature layer
[01:33:40] the role of a given neuron feature layer become whoa reconstruct search in the
[01:33:43] become whoa reconstruct search in the data set what are the top images and who
[01:33:45] data set what are the top images and who gradient ascent check can we check what
[01:33:48] gradient ascent check can we check what the network focus is on occasion
[01:33:50] the network focus is on occasion sensitivity saliency map class
[01:33:51] sensitivity saliency map class activation maps how does the network see
[01:33:54] activation maps how does the network see our world I would say gradient descent
[01:33:56] our world I would say gradient descent maybe deep drains of cool stuff and then
[01:33:58] maybe deep drains of cool stuff and then what are the implications and use cases
[01:34:02] what are the implications and use cases of these visualizations you can use
[01:34:06] of these visualizations you can use segment C mapped assignments not very
[01:34:08] segment C mapped assignments not very useful given the new methods we have but
[01:34:10] useful given the new methods we have but the convolution that we've seen together
[01:34:11] the convolution that we've seen together is widely used for segmentation and
[01:34:13] is widely used for segmentation and reconstruction also for generative a
[01:34:16] reconstruction also for generative a virtual networks to generate images and
[01:34:18] virtual networks to generate images and parts sometimes these visualization are
[01:34:22] parts sometimes these visualization are also helpful to detect if some of the
[01:34:25] also helpful to detect if some of the neurons in your network are dead so
[01:34:27] neurons in your network are dead so let's then you have a network and you
[01:34:28] let's then you have a network and you use the tool box and you see that
[01:34:30] use the tool box and you see that whatever the input image you give some
[01:34:32] whatever the input image you give some feature maps or always dark it means
[01:34:35] feature maps or always dark it means that the feature that generated the
[01:34:38] that the feature that generated the feature map icon holding over the inputs
[01:34:39] feature map icon holding over the inputs probably never detected anything so it's
[01:34:42] probably never detected anything so it's not being even trained that's the type
[01:34:44] not being even trained that's the type of
[01:34:44] of like you can get okay thanks guys sorry
[01:34:48] like you can get okay thanks guys sorry we'll wipe over time


================================================================================
LECTURE 008
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 - Career Advice / Reading Research Papers

Source: https://www.youtube.com/watch?v=733m6qBH-jI

---

Transcript

[00:00:05] okay everyone the cigar on so as usual
[00:00:10] okay everyone the cigar on so as usual if you have not yet please enter
[00:00:14] if you have not yet please enter su ID so that we know you're here in
[00:00:17] su ID so that we know you're here in this room
[00:00:19] this room so computing me okay at the back this is
[00:00:22] so computing me okay at the back this is okay oh yes is the volume okay at the
[00:00:25] okay oh yes is the volume okay at the back all right no one's responding yes
[00:00:29] back all right no one's responding yes okay all right thank you so what I want
[00:00:33] okay all right thank you so what I want to do today is share a few two things
[00:00:37] to do today is share a few two things you know we're approaching the in the
[00:00:39] you know we're approaching the in the courts our hope you guys looking forward
[00:00:41] courts our hope you guys looking forward to the Thanksgiving break next week
[00:00:44] to the Thanksgiving break next week action and I guess we're all home
[00:00:46] action and I guess we're all home viewers but those us those do that
[00:00:48] viewers but those us those do that viewing this from outside California
[00:00:49] viewing this from outside California know that we're all feeling really bad
[00:00:51] know that we're all feeling really bad air here in California hope there's some
[00:00:53] air here in California hope there's some of your Washington's at home you have
[00:00:54] of your Washington's at home you have better air wherever you are but what I
[00:00:59] better air wherever you are but what I hope to do today is give you some advice
[00:01:02] hope to do today is give you some advice that will set you up for the future sort
[00:01:05] that will set you up for the future sort of even beyond the conclusion of cs2 30
[00:01:08] of even beyond the conclusion of cs2 30 and in particular what I want to do
[00:01:10] and in particular what I want to do today is share view some advice on how
[00:01:13] today is share view some advice on how to read research papers because you know
[00:01:16] to read research papers because you know deep learning is evolving fast enough
[00:01:18] deep learning is evolving fast enough that even though you've learned a lot of
[00:01:20] that even though you've learned a lot of foundations of deep learning and learned
[00:01:22] foundations of deep learning and learned about the tips and tricks and Kari know
[00:01:24] about the tips and tricks and Kari know better than many practitioners how to
[00:01:26] better than many practitioners how to actually get deep learning algorithms to
[00:01:27] actually get deep learning algorithms to work already when you're working on
[00:01:31] work already when you're working on specific applications whether in
[00:01:33] specific applications whether in computer vision or national processing
[00:01:34] computer vision or national processing or speech recognition or something else
[00:01:36] or speech recognition or something else for you to be able to efficiently figure
[00:01:39] for you to be able to efficiently figure out the academic literature on key parts
[00:01:42] out the academic literature on key parts of the deep learning world will help you
[00:01:44] of the deep learning world will help you keep on developing and you know staying
[00:01:46] keep on developing and you know staying on top of ideas even as they evolve for
[00:01:48] on top of ideas even as they evolve for the next several years or maybe decades
[00:01:51] the next several years or maybe decades so first thing I want to do is um give
[00:01:53] so first thing I want to do is um give you advice on how when say when I'm
[00:01:56] you advice on how when say when I'm trying to master a new body of
[00:01:58] trying to master a new body of literature how I go about that and hope
[00:02:00] literature how I go about that and hope that those techniques will be useful to
[00:02:02] that those techniques will be useful to help you be more efficient how you read
[00:02:04] help you be more efficient how you read research papers and then the second
[00:02:06] research papers and then the second thing is in previous offerings of this
[00:02:09] thing is in previous offerings of this course one request from a lot of
[00:02:11] course one request from a lot of students was just advice to navigating a
[00:02:14] students was just advice to navigating a career in machine learning and so in the
[00:02:16] career in machine learning and so in the second half of today I want to share
[00:02:18] second half of today I want to share some thoughts with you on that okay so
[00:02:23] some thoughts with you on that okay so it turns out that
[00:02:26] it turns out that so I guess two topics meeting research
[00:02:29] so I guess two topics meeting research papers right and and then second career
[00:02:32] papers right and and then second career bison machine learning so it turns out
[00:02:35] bison machine learning so it turns out that you know reading research papers is
[00:02:43] that you know reading research papers is one of those things that a lot of PhD
[00:02:44] one of those things that a lot of PhD students learn by osmosis right meaning
[00:02:47] students learn by osmosis right meaning that if you're a PhD student and you see
[00:02:49] that if you're a PhD student and you see you know a few professors will see other
[00:02:51] you know a few professors will see other PhD students do certain things that you
[00:02:53] PhD students do certain things that you might try to pick it up by osmosis but I
[00:02:55] might try to pick it up by osmosis but I hope today to accelerate your efficiency
[00:02:57] hope today to accelerate your efficiency and how you acquire knowledge yourself
[00:03:00] and how you acquire knowledge yourself from the UM from the academic literature
[00:03:03] from the UM from the academic literature right and so um let's say that there's
[00:03:05] right and so um let's say that there's an area you want to become good at let's
[00:03:07] an area you want to become good at let's say you want to build that speech
[00:03:09] say you want to build that speech recognition or analysis terms is all for
[00:03:12] recognition or analysis terms is all for now let's see what to build that speech
[00:03:16] now let's see what to build that speech recognition system that we talked about
[00:03:17] recognition system that we talked about with the robber turn-on in the desk lamp
[00:03:19] with the robber turn-on in the desk lamp right this is what I read there's a
[00:03:22] right this is what I read there's a sequence of steps I recommend you take
[00:03:24] sequence of steps I recommend you take which is first compiled lists of papers
[00:03:33] which is first compiled lists of papers and and buy papers I mean both research
[00:03:37] and and buy papers I mean both research papers often posted on archive or on TV
[00:03:39] papers often posted on archive or on TV and Internet but also plus medium pose
[00:03:46] yeah well maybe some occasional github
[00:03:49] yeah well maybe some occasional github poses oh those are rare but whatever
[00:03:52] poses oh those are rare but whatever texts or learning resources you have and
[00:03:54] texts or learning resources you have and then what I usually do is end up
[00:03:58] then what I usually do is end up skipping around in this so if I'm trying
[00:04:02] skipping around in this so if I'm trying to master a new body of knowledge so you
[00:04:04] to master a new body of knowledge so you will learn about speech recognition
[00:04:05] will learn about speech recognition systems this is what it feels like to
[00:04:07] systems this is what it feels like to read a set of papers which is maybe you
[00:04:09] read a set of papers which is maybe you initially start off with five papers and
[00:04:13] initially start off with five papers and if on the horizontal axis I plot you
[00:04:16] if on the horizontal axis I plot you know 0% to 100% read / understood right
[00:04:24] know 0% to 100% read / understood right the way it feels like reading these
[00:04:26] the way it feels like reading these papers is often oh we you know 10% of
[00:04:31] papers is often oh we you know 10% of each paper or try to quickly skim and
[00:04:33] each paper or try to quickly skim and understand each of these papers and if
[00:04:36] understand each of these papers and if based on that you decide that paper
[00:04:38] based on that you decide that paper Mateusz it died right other other other
[00:04:40] Mateusz it died right other other other authors even cited and say boy they sure
[00:04:43] authors even cited and say boy they sure got it wrong when you read it it just
[00:04:44] got it wrong when you read it it just doesn't make sense then go ahead and
[00:04:46] doesn't make sense then go ahead and forget it and as you skip around to
[00:04:49] forget it and as you skip around to different papers you might decide that
[00:04:51] different papers you might decide that paper three is the really seminal one
[00:04:53] paper three is the really seminal one and then spend a lot of time to go ahead
[00:04:57] and then spend a lot of time to go ahead and read and understand the whole thing
[00:04:58] and read and understand the whole thing and based on that you might then find a
[00:05:01] and based on that you might then find a sixth paper from the citations and read
[00:05:03] sixth paper from the citations and read that and go back and flesh sure you
[00:05:06] that and go back and flesh sure you understand your paper for and then find
[00:05:08] understand your paper for and then find a paper seven and go and read that all
[00:05:10] a paper seven and go and read that all the way to the conclusion but this is
[00:05:13] the way to the conclusion but this is what it feels like as you you know
[00:05:15] what it feels like as you you know assemble a list of papers and skip
[00:05:16] assemble a list of papers and skip around and try to master a body of
[00:05:20] around and try to master a body of literature right around some topic that
[00:05:22] literature right around some topic that you want to learn and I think um some
[00:05:25] you want to learn and I think um some rough guidelines you know if you read
[00:05:28] rough guidelines you know if you read fifty to twenty papers I think you have
[00:05:30] fifty to twenty papers I think you have a basic understanding of an area right
[00:05:32] a basic understanding of an area right may be good enough to do some work apply
[00:05:35] may be good enough to do some work apply some algorithms if you read 50 to 100
[00:05:39] some algorithms if you read 50 to 100 papers in an area and they speech
[00:05:41] papers in an area and they speech recognition and and kind of understand a
[00:05:43] recognition and and kind of understand a lot of it then that's pretty enough to
[00:05:45] lot of it then that's pretty enough to give you a very good understanding of an
[00:05:47] give you a very good understanding of an area right you you might I don't know
[00:05:49] area right you you might I don't know I'm always careful about when I say you
[00:05:51] I'm always careful about when I say you know you're mastering a subject but you
[00:05:53] know you're mastering a subject but you read fifty a hundred papers on speech
[00:05:54] read fifty a hundred papers on speech recognition you have a very good
[00:05:56] recognition you have a very good understanding of speech recognition or
[00:05:58] understanding of speech recognition or if you're interested in say domain
[00:05:59] if you're interested in say domain adaptation right by the time you've read
[00:06:01] adaptation right by the time you've read fifty or hundred papers you have a very
[00:06:03] fifty or hundred papers you have a very good understanding of a subject like
[00:06:05] good understanding of a subject like that
[00:06:05] that but the three five to twenty papers it's
[00:06:07] but the three five to twenty papers it's probably enough for you to implement it
[00:06:09] probably enough for you to implement it but maybe not not sure if it's enough
[00:06:11] but maybe not not sure if it's enough for you to do research or be really at
[00:06:12] for you to do research or be really at the cutting edge but these are maybe
[00:06:14] the cutting edge but these are maybe some guidelines for the volume of
[00:06:16] some guidelines for the volume of meeting you should aspire to if you want
[00:06:18] meeting you should aspire to if you want to pick up a new area I'll take one of
[00:06:20] to pick up a new area I'll take one of subjects in CS 230 and go more deeply
[00:06:22] subjects in CS 230 and go more deeply into it um know how do you read one
[00:06:31] into it um know how do you read one paper and um I hope most of you brought
[00:06:34] paper and um I hope most of you brought your laptops so what I'm going to do is
[00:06:36] your laptops so what I'm going to do is describe to you how I read one paper and
[00:06:39] describe to you how I read one paper and then after that I'm just going to ask
[00:06:41] then after that I'm just going to ask all of you to you know download the
[00:06:43] all of you to you know download the paper online and just take I don't know
[00:06:46] paper online and just take I don't know take take a few minutes to read a paper
[00:06:48] take take a few minutes to read a paper right here in class and see how far you
[00:06:50] right here in class and see how far you can get understanding a research paper
[00:06:53] can get understanding a research paper in just minutes right right here in cost
[00:06:55] in just minutes right right here in cost okay um so when reading one paper so the
[00:07:00] okay um so when reading one paper so the bad way to read a paper is to go from
[00:07:03] bad way to read a paper is to go from the first word until the last word right
[00:07:05] the first word until the last word right this is a bad way to when you have a
[00:07:07] this is a bad way to when you have a paper like this oh and by the way
[00:07:08] paper like this oh and by the way actually here and tell you what my real
[00:07:10] actually here and tell you what my real life is like
[00:07:11] life is like so I actually pretty much everywhere I
[00:07:15] so I actually pretty much everywhere I go whenever i backpack this is my actual
[00:07:18] go whenever i backpack this is my actual folder I don't wanna show this is my
[00:07:21] folder I don't wanna show this is my actual folder of unread paper so pretty
[00:07:24] actual folder of unread paper so pretty much everywhere I go I actually have a
[00:07:26] much everywhere I go I actually have a paper yeah a stack of papers is on my
[00:07:28] paper yeah a stack of papers is on my personal reading list
[00:07:30] personal reading list this is actually my real life I didn't
[00:07:32] this is actually my real life I didn't bring this to show you this is in my
[00:07:33] bring this to show you this is in my backpack all the time
[00:07:34] backpack all the time and I think that I don't know these days
[00:07:37] and I think that I don't know these days on my team at landing a on deep-lined
[00:07:40] on my team at landing a on deep-lined Rai I personally lead in reading group
[00:07:42] Rai I personally lead in reading group where I lead a discussion about two
[00:07:43] where I lead a discussion about two papers a week but to select two papers
[00:07:46] papers a week but to select two papers it means I need to read like five or six
[00:07:47] it means I need to read like five or six papers a week to select to you know to
[00:07:50] papers a week to select to you know to present or discuss at the land area and
[00:07:52] present or discuss at the land area and deeper into a meeting room so this is my
[00:07:54] deeper into a meeting room so this is my room life right and how I try to stay on
[00:07:56] room life right and how I try to stay on top of the literature and and I'm doing
[00:07:58] top of the literature and and I'm doing a lot if I can find this fine if I can
[00:08:00] a lot if I can find this fine if I can find the time to read you know a couple
[00:08:01] find the time to read you know a couple papers of me hopefully all of you can
[00:08:03] papers of me hopefully all of you can too but when I'm reading a paper this is
[00:08:07] too but when I'm reading a paper this is this is how I would recommend you go
[00:08:09] this is how I would recommend you go about it which is the don't go from the
[00:08:11] about it which is the don't go from the first or and then read until the last
[00:08:12] first or and then read until the last word um instead take multiple passes
[00:08:16] word um instead take multiple passes through the paper right um and so you
[00:08:23] through the paper right um and so you know step one is uh read the title the
[00:08:34] know step one is uh read the title the abstract and also the figures especially
[00:08:40] abstract and also the figures especially in deep learning there are a lot of
[00:08:42] in deep learning there are a lot of research papers where so the entire
[00:08:44] research papers where so the entire paper is summarized in one or two
[00:08:46] paper is summarized in one or two figures in the figure captions so so
[00:08:50] figures in the figure captions so so sometimes just by reading the title
[00:08:51] sometimes just by reading the title abstract and you know the key neural
[00:08:54] abstract and you know the key neural network architecture figure that just
[00:08:55] network architecture figure that just describes what the whole papers are and
[00:08:57] describes what the whole papers are and maybe one or two of the experiments that
[00:08:59] maybe one or two of the experiments that you can sometimes get a very good sense
[00:09:01] you can sometimes get a very good sense of what the whole paper is about without
[00:09:03] of what the whole paper is about without you know hardly reading any of the text
[00:09:05] you know hardly reading any of the text in the paper itself right that's the
[00:09:07] in the paper itself right that's the first pass second pause I would tend to
[00:09:10] first pass second pause I would tend to read more carefully the intro the
[00:09:16] read more carefully the intro the conclusions look carefully at all the
[00:09:20] conclusions look carefully at all the figures again and then skim the rest and
[00:09:29] figures again and then skim the rest and you know I don't know how many of you
[00:09:31] you know I don't know how many of you have published academic papers but when
[00:09:34] have published academic papers but when people publish academic papers part of
[00:09:37] people publish academic papers part of you know the publication process is
[00:09:39] you know the publication process is convincing the reviewers that your paper
[00:09:41] convincing the reviewers that your paper is worthy for acceptance and so what you
[00:09:44] is worthy for acceptance and so what you find is that the abstract entering
[00:09:46] find is that the abstract entering conclusion is often when the authors try
[00:09:49] conclusion is often when the authors try to summarize that weren't really really
[00:09:50] to summarize that weren't really really carefully to make a case to make a
[00:09:53] carefully to make a case to make a really clear case to the review as for
[00:09:55] really clear case to the review as for why you know they think their paper
[00:09:56] why you know they think their paper should be accepted for publication and
[00:09:58] should be accepted for publication and so because of that you know maybe it's
[00:10:01] so because of that you know maybe it's slightly not slightly unusual incentive
[00:10:03] slightly not slightly unusual incentive the intro and conclusion and after I
[00:10:05] the intro and conclusion and after I often give a very clear summary of
[00:10:07] often give a very clear summary of what's the paper actually about and
[00:10:11] what's the paper actually about and depending on the game just be you know
[00:10:20] depending on the game just be you know bluntly honest with you guys the related
[00:10:23] bluntly honest with you guys the related work section is useful if you want
[00:10:25] work section is useful if you want sometimes useful you want to do
[00:10:28] sometimes useful you want to do understand related work and figure out
[00:10:30] understand related work and figure out what's what are the most important works
[00:10:31] what's what are the most important works in the papers but the first time you
[00:10:33] in the papers but the first time you read this you might skim or even skip
[00:10:35] read this you might skim or even skip skim the related work section it turns
[00:10:37] skim the related work section it turns out unless you're really familiar
[00:10:38] out unless you're really familiar literature if this is a body of not work
[00:10:41] literature if this is a body of not work that you're not that familiar with the
[00:10:42] that you're not that familiar with the related work section is sometimes almost
[00:10:44] related work section is sometimes almost impossible to understand and again since
[00:10:47] impossible to understand and again since I'm being very honest with you guys
[00:10:48] I'm being very honest with you guys sometimes the related work section is
[00:10:50] sometimes the related work section is when the author's try to cite everyone
[00:10:52] when the author's try to cite everyone that could possibly be reviewing the
[00:10:53] that could possibly be reviewing the paper and to make them feel good and
[00:10:56] paper and to make them feel good and then hopefully accept the paper office
[00:10:57] then hopefully accept the paper office of related work sections or sometimes
[00:10:59] of related work sections or sometimes written in funny ways right
[00:11:02] written in funny ways right and then set three I often read the
[00:11:07] and then set three I often read the paper but um just skip the math for beat
[00:11:22] paper but um just skip the math for beat the whole thing but skip pass it don't
[00:11:26] the whole thing but skip pass it don't make sense
[00:11:37] you know I think that one thing this
[00:11:44] you know I think that one thing this happened many times in the research is
[00:11:46] happened many times in the research is that I mean the papers we tend to be
[00:11:48] that I mean the papers we tend to be cutting edge research and so when we
[00:11:51] cutting edge research and so when we publish things we sometimes don't know
[00:11:54] publish things we sometimes don't know what's really important and what's not
[00:11:55] what's really important and what's not important right so there there are many
[00:11:58] important right so there there are many examples of well-known highly cited
[00:12:01] examples of well-known highly cited research papers whereas some of it was
[00:12:03] research papers whereas some of it was just great stuff and some of it you know
[00:12:05] just great stuff and some of it you know turned out to be unimportant but at the
[00:12:07] turned out to be unimportant but at the time the paper was written the authors
[00:12:09] time the paper was written the authors did not know every no one on the planet
[00:12:10] did not know every no one on the planet knew what was important that what was
[00:12:12] knew what was important that what was not important and maybe one example the
[00:12:15] not important and maybe one example the Lynette five paper write seminal paper
[00:12:17] Lynette five paper write seminal paper by Yamaka and part of it was phenomenal
[00:12:20] by Yamaka and part of it was phenomenal just established a lot of the
[00:12:21] just established a lot of the foundations of confidence and so is it
[00:12:24] foundations of confidence and so is it one of the most incredibly influential
[00:12:25] one of the most incredibly influential papers but you go back and read that
[00:12:26] papers but you go back and read that paper and another sort of whole half of
[00:12:29] paper and another sort of whole half of the paper was about other stuff right
[00:12:30] the paper was about other stuff right transducers and so on then it's much
[00:12:32] transducers and so on then it's much less used and so and so it's fine if you
[00:12:35] less used and so and so it's fine if you read a paper and some of it doesn't make
[00:12:37] read a paper and some of it doesn't make sense because it's not that unusual or
[00:12:38] sense because it's not that unusual or sometimes it just happens that great
[00:12:41] sometimes it just happens that great research means we're publishing things
[00:12:42] research means we're publishing things at the boundaries of our knowledge and
[00:12:44] at the boundaries of our knowledge and sometimes the stuff you see you know
[00:12:47] sometimes the stuff you see you know will realize five years in the future
[00:12:49] will realize five years in the future that that wasn't the most important
[00:12:50] that that wasn't the most important thing after all right all that what was
[00:12:52] thing after all right all that what was the key part of the algorithm maybe
[00:12:54] the key part of the algorithm maybe wasn't what you office thought so
[00:12:55] wasn't what you office thought so sometimes it passed paper don't make
[00:12:57] sometimes it passed paper don't make sense
[00:12:57] sense it's okay to skim it initially and move
[00:13:00] it's okay to skim it initially and move on
[00:13:01] on great unless you're trying to do a peek
[00:13:02] great unless you're trying to do a peek unless you're trying to do deep research
[00:13:04] unless you're trying to do deep research and really need to master it then go
[00:13:06] and really need to master it then go ahead and spend more time but they're
[00:13:07] ahead and spend more time but they're trying to get through a lot of papers
[00:13:08] trying to get through a lot of papers then
[00:13:09] then you know then then it's just
[00:13:11] you know then then it's just prioritizing your time okay and so just
[00:13:17] prioritizing your time okay and so just a few last things and then I'll ask you
[00:13:21] a few last things and then I'll ask you to practice this yourself with a paper
[00:13:24] to practice this yourself with a paper right um you know I think that when
[00:13:27] right um you know I think that when you've read and understood the paper um
[00:13:31] these are questions to try to keep in
[00:13:34] these are questions to try to keep in mind and when you read a paper in a few
[00:13:35] mind and when you read a paper in a few minutes maybe and try to answer these
[00:13:37] minutes maybe and try to answer these questions whether you also strive to
[00:13:39] questions whether you also strive to accomplish and what I hope to do in a
[00:13:43] accomplish and what I hope to do in a few minutes is asked you to download the
[00:13:45] few minutes is asked you to download the paper off the internet read it and then
[00:13:48] paper off the internet read it and then try to answer these questions and
[00:13:49] try to answer these questions and discuss your answer to these questions
[00:13:51] discuss your answer to these questions with work with your peers write with
[00:13:53] with work with your peers write with others in the class what were the key
[00:14:00] others in the class what were the key elements what can you use yourself and
[00:14:22] elements what can you use yourself and um okay so I think if you can answer
[00:14:45] um okay so I think if you can answer these questions hopefully that will
[00:14:48] these questions hopefully that will reflect that you have a pretty good
[00:14:49] reflect that you have a pretty good understanding of the paper okay and so
[00:14:53] understanding of the paper okay and so what I would like you to do is pull your
[00:14:57] what I would like you to do is pull your laptop and then so yo there's a chip so
[00:15:01] laptop and then so yo there's a chip so I think on the confident videos right on
[00:15:03] I think on the confident videos right on the developer I can't have videos on
[00:15:07] the developer I can't have videos on Coursera you learn the bed about well
[00:15:10] Coursera you learn the bed about well various neural network architecture is
[00:15:12] various neural network architecture is absurd resonates and it turns out that
[00:15:14] absurd resonates and it turns out that there's another follow-on piece of work
[00:15:17] there's another follow-on piece of work that maybe builds on some of the ideas
[00:15:18] that maybe builds on some of the ideas of resinous which is called dense net
[00:15:21] of resinous which is called dense net so what I'd like you to do is oh and so
[00:15:24] so what I'd like you to do is oh and so wonder that kids do is actually try this
[00:15:26] wonder that kids do is actually try this and when I'm reading a paper again in
[00:15:29] and when I'm reading a paper again in the earliest stages don't get stuck on
[00:15:30] the earliest stages don't get stuck on the mouth just go ahead and skim the map
[00:15:32] the mouth just go ahead and skim the map and read the English text we can get
[00:15:33] and read the English text we can get through faster and maybe one of the
[00:15:36] through faster and maybe one of the principles is go from the very efficient
[00:15:38] principles is go from the very efficient high information content for us and then
[00:15:40] high information content for us and then go to the harder material later
[00:15:42] go to the harder material later remember it's why often I'll just skim
[00:15:44] remember it's why often I'll just skim the map and I don't if I don't
[00:15:45] the map and I don't if I don't understand the similar integration just
[00:15:47] understand the similar integration just move on and then the only data go back
[00:15:48] move on and then the only data go back and and really try to figure out the map
[00:15:50] and and really try to figure out the map more careful okay
[00:15:51] more careful okay so what I'd like you to do is take our
[00:15:54] so what I'd like you to do is take our which take on wonderful let's let's try
[00:16:00] which take on wonderful let's let's try it listen let's have you takes seven
[00:16:02] it listen let's have you takes seven minutes where I'm thinking maybe one one
[00:16:04] minutes where I'm thinking maybe one one minute per page is quite fast and search
[00:16:08] minute per page is quite fast and search for this paper densely connected
[00:16:12] for this paper densely connected convolutional eunuch networks by up
[00:16:24] that's all okay once you guys take out
[00:16:27] that's all okay once you guys take out your laptop's search of this paper
[00:16:29] your laptop's search of this paper download that usually refined JSON
[00:16:31] download that usually refined JSON archive arxiv right and and this is also
[00:16:36] archive arxiv right and and this is also sometimes also call this dense necks I
[00:16:38] sometimes also call this dense necks I guess and go ahead and take once you
[00:16:46] guess and go ahead and take once you take like seven minutes to read this
[00:16:48] take like seven minutes to read this paper and I'll let you know when the
[00:16:50] paper and I'll let you know when the time is passed and then after that time
[00:16:53] time is passed and then after that time I'd like you to you know discuss with
[00:16:56] I'd like you to you know discuss with your work with the others write what you
[00:16:59] your work with the others write what you think are the answers especially the
[00:17:01] think are the answers especially the first two because the other two you can
[00:17:02] first two because the other two you can leave off but once you go ahead and take
[00:17:05] leave off but once you go ahead and take a few minutes to do that now and then
[00:17:06] a few minutes to do that now and then I'll let you know when sort of like
[00:17:09] I'll let you know when sort of like seven minutes have passed and then you
[00:17:11] seven minutes have passed and then you can discuss your answers of these with
[00:17:12] can discuss your answers of these with your friends
[00:17:14] your friends all right guys so anyone with any
[00:17:20] all right guys so anyone with any thoughts or insights surprises or
[00:17:24] thoughts or insights surprises or thoughts from this so now you spent
[00:17:27] thoughts from this so now you spent eleven minutes on this paper right seven
[00:17:29] eleven minutes on this paper right seven minutes reading four minutes discussing
[00:17:30] minutes reading four minutes discussing we're just really really short period of
[00:17:32] we're just really really short period of time but any any thoughts what do you
[00:17:35] time but any any thoughts what do you think of the paper
[00:17:41] yo-yo just spent a lot of time saying
[00:17:43] yo-yo just spent a lot of time saying all stuff to each other what do people
[00:17:47] all stuff to each other what do people think of the time you spend trying to
[00:17:49] think of the time you spend trying to read the paper actually they should tell
[00:17:59] read the paper actually they should tell you how I should raise your hand if you
[00:18:01] you how I should raise your hand if you you know you kind of understood the main
[00:18:02] you know you kind of understood the main concepts in and actually depressing of
[00:18:11] concepts in and actually depressing of the figures
[00:18:20] Wow people are really less than
[00:18:22] Wow people are really less than energetic today unusual so I think this
[00:18:29] energetic today unusual so I think this is one of those papers where um the the
[00:18:32] is one of those papers where um the the paper is almost entirely summarized in
[00:18:36] paper is almost entirely summarized in figures 1 &amp; 2 I think of you a lot they
[00:18:40] figures 1 &amp; 2 I think of you a lot they would not be if you look at figure 1 in
[00:18:43] would not be if you look at figure 1 in the caption there in Figure 2 on page 3
[00:18:46] the caption there in Figure 2 on page 3 and the caption there and understand
[00:18:48] and the caption there and understand those two figures those really convey
[00:18:49] those two figures those really convey you know 80% of the idea of the paper
[00:18:52] you know 80% of the idea of the paper right and I think that a a couple of
[00:18:59] right and I think that a a couple of other tips so it turns out that as you
[00:19:01] other tips so it turns out that as you read these papers what practice you do
[00:19:04] read these papers what practice you do get faster so for example table 1 on
[00:19:09] get faster so for example table 1 on page 4 right you know this mess of the
[00:19:12] page 4 right you know this mess of the table on top this is a pretty common
[00:19:15] table on top this is a pretty common format or a format like this is how a
[00:19:17] format or a format like this is how a lot of authors use to describe their
[00:19:19] lot of authors use to describe their network architecture especially in
[00:19:21] network architecture especially in computer vision so one of the things you
[00:19:23] computer vision so one of the things you find as well is that the first time you
[00:19:26] find as well is that the first time you see something like table 1 it just looks
[00:19:27] see something like table 1 it just looks really complicated but by the time
[00:19:29] really complicated but by the time you've read a few papers in the similar
[00:19:31] you've read a few papers in the similar format you can look at able one and go
[00:19:33] format you can look at able one and go oh yep got it you know this is this is
[00:19:35] oh yep got it you know this is this is this is the dense net 121 for suggest
[00:19:38] this is the dense net 121 for suggest and 169 architecture and be able to more
[00:19:40] and 169 architecture and be able to more quickly pick up those things and so
[00:19:42] quickly pick up those things and so another thing you'll find is that
[00:19:43] another thing you'll find is that reading these papers actually gets
[00:19:45] reading these papers actually gets better of practice because you see
[00:19:47] better of practice because you see different authors use different ways or
[00:19:49] different authors use different ways or similar ways of expressing themselves
[00:19:50] similar ways of expressing themselves and it gets used to that you actually be
[00:19:52] and it gets used to that you actually be faster and faster at implementing these
[00:19:55] faster and faster at implementing these understandings ideas and I think I know
[00:19:59] understandings ideas and I think I know these days we're not reading a paper
[00:20:00] these days we're not reading a paper like this it maybe takes me about half
[00:20:02] like this it maybe takes me about half an hour to took you like and I know I
[00:20:04] an hour to took you like and I know I gave you guys seven minutes when I
[00:20:05] gave you guys seven minutes when I thought I would need maybe half an hour
[00:20:07] thought I would need maybe half an hour to figure out a paper like this and I
[00:20:11] to figure out a paper like this and I think for a more and I find it it's not
[00:20:14] think for a more and I find it it's not unusual for people relatively new to
[00:20:16] unusual for people relatively new to machine learning to me maybe an hour to
[00:20:19] machine learning to me maybe an hour to kind of you know really understand the
[00:20:20] kind of you know really understand the paper like this and then although I'm
[00:20:23] paper like this and then although I'm pretty experienced machine learning some
[00:20:25] pretty experienced machine learning some down to maybe half an hour for people
[00:20:26] down to maybe half an hour for people like this maybe even 20 minutes
[00:20:28] like this maybe even 20 minutes it was a really easy one but there are
[00:20:30] it was a really easy one but there are some outliers so I have some colleagues
[00:20:32] some outliers so I have some colleagues sometimes stumble across a really
[00:20:34] sometimes stumble across a really difficult paper you need to chase out a
[00:20:36] difficult paper you need to chase out a lot of references and learn a lot of
[00:20:37] lot of references and learn a lot of others now so sometimes you come across
[00:20:39] others now so sometimes you come across a paper that takes you three or four
[00:20:41] a paper that takes you three or four hours or even longer to really
[00:20:43] hours or even longer to really understand it but but I think depending
[00:20:46] understand it but but I think depending on how much time you want to spend
[00:20:48] on how much time you want to spend probably reading papers you can actually
[00:20:51] probably reading papers you can actually learn you know learn a lot rate doing
[00:20:54] learn you know learn a lot rate doing what you just did but maybe spending
[00:20:55] what you just did but maybe spending half an hour per paper in our paper
[00:20:57] half an hour per paper in our paper rather than seven minutes right um so
[00:21:01] rather than seven minutes right um so all right I feel like yeah that's great
[00:21:06] all right I feel like yeah that's great and notice that I've actually not sent
[00:21:09] and notice that I've actually not sent anything about the content of this paper
[00:21:10] anything about the content of this paper right so whatever you guys just learned
[00:21:12] right so whatever you guys just learned that was all you I had nothing to do
[00:21:14] that was all you I had nothing to do with it so yeah like you're off there
[00:21:16] with it so yeah like you're off there both you go and learn this stuff by
[00:21:17] both you go and learn this stuff by yourself you don't need me anymore so
[00:21:22] yourself you don't need me anymore so just the last few comments let's see so
[00:21:28] just the last few comments let's see so the other few can you ask questions I
[00:21:30] the other few can you ask questions I get is uh you know where do you go the
[00:21:36] get is uh you know where do you go the deep learning field evolves so rapidly
[00:21:37] deep learning field evolves so rapidly so where where do you go to so if you're
[00:21:41] so where where do you go to so if you're trying to master a new body of knowledge
[00:21:42] trying to master a new body of knowledge definitely do web searches and they're
[00:21:44] definitely do web searches and they're often good blog holes on you know here
[00:21:46] often good blog holes on you know here the most important papers and speech
[00:21:48] the most important papers and speech recognition there are lots of great
[00:21:49] recognition there are lots of great resources there and then the other thing
[00:21:51] resources there and then the other thing you I don't know a lot of people try
[00:21:53] you I don't know a lot of people try want to do is try to keep up with the
[00:21:55] want to do is try to keep up with the state of the art of deep learning even
[00:21:56] state of the art of deep learning even as is evolving rapidly and so I'll just
[00:22:00] as is evolving rapidly and so I'll just tell you where I go to keep up with you
[00:22:04] tell you where I go to keep up with you know discussions announcements
[00:22:05] know discussions announcements surprisingly Twitter is becoming
[00:22:07] surprisingly Twitter is becoming surprisingly important place for
[00:22:09] surprisingly important place for researchers to find it about new things
[00:22:12] researchers to find it about new things there's an ml subreddit is actually
[00:22:16] there's an ml subreddit is actually pretty good a lot of noise but many
[00:22:19] pretty good a lot of noise but many important pieces of work do get
[00:22:21] important pieces of work do get mentioned there some of the top machine
[00:22:24] mentioned there some of the top machine learning Kong conferences are nips ICML
[00:22:28] learning Kong conferences are nips ICML and I clear all right and so whenever
[00:22:31] and I clear all right and so whenever these conferences come around take a
[00:22:33] these conferences come around take a look and glossary at least the title see
[00:22:35] look and glossary at least the title see if there's something that interests you
[00:22:36] if there's something that interests you and then I think I'm fortunate I guess
[00:22:38] and then I think I'm fortunate I guess have friends you know both colleagues
[00:22:42] have friends you know both colleagues here in Stanford as those colleagues are
[00:22:44] here in Stanford as those colleagues are several the teams that work with that
[00:22:47] several the teams that work with that that just tell me whether there's a cool
[00:22:49] that just tell me whether there's a cool paper I guess but I think with here
[00:22:51] paper I guess but I think with here within Stanford or among with your
[00:22:53] within Stanford or among with your workplace for those of you taking this
[00:22:54] workplace for those of you taking this at a CPD if you can form a community
[00:22:56] at a CPD if you can form a community that shares interesting papers so all
[00:22:59] that shares interesting papers so all the grooves I have are on slack and we
[00:23:01] the grooves I have are on slack and we regularly slack each other or send send
[00:23:03] regularly slack each other or send send each other text messages on the slack
[00:23:05] each other text messages on the slack messaging system where you find
[00:23:07] messaging system where you find interesting papers and that that's been
[00:23:08] interesting papers and that that's been great for me actually yeah oh and and
[00:23:12] great for me actually yeah oh and and and Twitter
[00:23:12] and Twitter let's see ken is I fell looking at you
[00:23:16] let's see ken is I fell looking at you could follow him to this is me engine
[00:23:22] could follow him to this is me engine whining right I pray don't slack up
[00:23:25] whining right I pray don't slack up papers as often as I do but if you look
[00:23:27] papers as often as I do but if you look at I don't know you can also look at who
[00:23:29] at I don't know you can also look at who we follow their love of could be
[00:23:30] we follow their love of could be searches that then will share all these
[00:23:33] searches that then will share all these things online oh and um there's a bunch
[00:23:36] things online oh and um there's a bunch of people they also use a website called
[00:23:38] of people they also use a website called archive sanity I don't ask much
[00:23:40] archive sanity I don't ask much sometimes but lots of visuals is like
[00:23:43] sometimes but lots of visuals is like that um cool so just two lost tips for
[00:23:53] that um cool so just two lost tips for how to read papers in good good at this
[00:23:59] so it's a more deeply understand a paper
[00:24:02] so it's a more deeply understand a paper some of the papers will have math in it
[00:24:06] some of the papers will have math in it and actually if you read the oh no y'all
[00:24:09] and actually if you read the oh no y'all learn about fashion all right in the
[00:24:11] learn about fashion all right in the second modules if you read The Bachelor
[00:24:14] second modules if you read The Bachelor on paper is actually one of harder
[00:24:16] on paper is actually one of harder papers you read there's a lot of math in
[00:24:19] papers you read there's a lot of math in the derivation or vaginal but they're
[00:24:21] the derivation or vaginal but they're papers like that and if you want to make
[00:24:22] papers like that and if you want to make sure you understand in math here's what
[00:24:25] sure you understand in math here's what I would recommend which is a read
[00:24:27] I would recommend which is a read through it take detailed notes and then
[00:24:30] through it take detailed notes and then see if you can read arrive in from
[00:24:31] see if you can read arrive in from scratch so if you want to deeply
[00:24:36] scratch so if you want to deeply understand the math of an algorithm like
[00:24:37] understand the math of an algorithm like you know fashion or more the details of
[00:24:40] you know fashion or more the details of back problems
[00:24:41] back problems the good practice and I think a lot of
[00:24:44] the good practice and I think a lot of them sort of a theory their own from the
[00:24:48] them sort of a theory their own from the science and mathematics PhD says will
[00:24:50] science and mathematics PhD says will use a practice like this
[00:24:51] use a practice like this we're just go ahead and read the paper
[00:24:52] we're just go ahead and read the paper make sure you understand it and then to
[00:24:54] make sure you understand it and then to make sure you really really understand
[00:24:56] make sure you really really understand it put a put aside the result and try to
[00:25:00] it put a put aside the result and try to read arrive the math yourself from
[00:25:02] read arrive the math yourself from scratch and you can start from a blank
[00:25:03] scratch and you can start from a blank piece of paper and read arrive one of
[00:25:05] piece of paper and read arrive one of these algorithms from scratch then
[00:25:07] these algorithms from scratch then that's a good sign that you really
[00:25:08] that's a good sign that you really understand it
[00:25:09] understand it when I was a PhD student I did this a
[00:25:11] when I was a PhD student I did this a lot right that you know I wouldn't be
[00:25:13] lot right that you know I wouldn't be the text book or read the paper or
[00:25:15] the text book or read the paper or something and then put aside whether I
[00:25:17] something and then put aside whether I read and see if I could read arrived it
[00:25:18] read and see if I could read arrived it from scratch starting from a blank piece
[00:25:20] from scratch starting from a blank piece of paper as only if I could do that that
[00:25:22] of paper as only if I could do that that I would you know feel like yep I think I
[00:25:24] I would you know feel like yep I think I understand this piece of math and it
[00:25:26] understand this piece of math and it turns out if you want me to do this type
[00:25:27] turns out if you want me to do this type of map yourself is your ability to
[00:25:30] of map yourself is your ability to derive this type of map we divide the
[00:25:32] derive this type of map we divide the size of math that gives you the ability
[00:25:34] size of math that gives you the ability to generalize to derive new novel pieces
[00:25:37] to generalize to derive new novel pieces of map yourself so I think I actually
[00:25:39] of map yourself so I think I actually learned all the math for several machine
[00:25:41] learned all the math for several machine learning by doing this and this by read
[00:25:43] learning by doing this and this by read arriving other people's work that
[00:25:44] arriving other people's work that allowed me to learn how to divide my own
[00:25:46] allowed me to learn how to divide my own novel algorithms and actually sometimes
[00:25:49] novel algorithms and actually sometimes you go to the art galleries right let go
[00:25:52] you go to the art galleries right let go to Smithsonian
[00:25:53] to Smithsonian you see these aren't students you know
[00:25:55] you see these aren't students you know sitting on the floor copying the great
[00:25:58] sitting on the floor copying the great artworks to create paintings you know
[00:26:00] artworks to create paintings you know painted by the masses centuries ago and
[00:26:02] painted by the masses centuries ago and so I think just as today there are
[00:26:05] so I think just as today there are students sitting in or the DeYoung
[00:26:07] students sitting in or the DeYoung museum or whatever every war and I was
[00:26:09] museum or whatever every war and I was at a Getty Museum in LA a few months ago
[00:26:12] at a Getty Museum in LA a few months ago you actually see these art students you
[00:26:14] you actually see these art students you know copying the work of the Masters and
[00:26:16] know copying the work of the Masters and I think a lot of the ways that you want
[00:26:18] I think a lot of the ways that you want to become good at the math of machine
[00:26:20] to become good at the math of machine learning yourself this is this is a good
[00:26:22] learning yourself this is this is a good way to do it it's time-consuming but
[00:26:24] way to do it it's time-consuming but then you can become good at it anyway
[00:26:25] then you can become good at it anyway and the same thing for codes right I
[00:26:28] and the same thing for codes right I think the simple you know lightweight
[00:26:32] think the simple you know lightweight version one of learning would be to
[00:26:34] version one of learning would be to download and run the open-source code if
[00:26:38] download and run the open-source code if you can find it and the deeper way to
[00:26:40] you can find it and the deeper way to learn this material is the reimplemented
[00:26:42] learn this material is the reimplemented from scratch it's easy to download open
[00:26:52] from scratch it's easy to download open sourcing and rather
[00:26:53] sourcing and rather it works but if you can reemployment one
[00:26:55] it works but if you can reemployment one of these algorithms from scratch then
[00:26:58] of these algorithms from scratch then that's a strong sign that you really
[00:26:59] that's a strong sign that you really understood this our problem okay all
[00:27:06] understood this our problem okay all right and then longer-term advice um you
[00:27:25] right and then longer-term advice um you know for you to keep on learning and
[00:27:26] know for you to keep on learning and keep on getting better and better the
[00:27:28] keep on getting better and better the more important thing is for you to learn
[00:27:30] more important thing is for you to learn steadily not for you to have a focus
[00:27:32] steadily not for you to have a focus intense activity you know like all of
[00:27:35] intense activity you know like all of Thanksgiving you read 50 papers over
[00:27:37] Thanksgiving you read 50 papers over Thanksgiving and then you're done for
[00:27:39] Thanksgiving and then you're done for the rest of your life and it doesn't
[00:27:40] the rest of your life and it doesn't work like that right and I think you're
[00:27:42] work like that right and I think you're actually much better off reading to a
[00:27:43] actually much better off reading to a few papers a week for the next year then
[00:27:46] few papers a week for the next year then you know cramming everything right over
[00:27:49] you know cramming everything right over over one long weekend or something
[00:27:50] over one long weekend or something actually an education where she know
[00:27:52] actually an education where she know that spaced repetition works better than
[00:27:54] that spaced repetition works better than cramming so the same same thing same
[00:27:57] cramming so the same same thing same body of learning if you learn a bit
[00:27:58] body of learning if you learn a bit every week in space without you actually
[00:28:00] every week in space without you actually have much better long-term attention
[00:28:02] have much better long-term attention then you try to cram it in a short term
[00:28:04] then you try to cram it in a short term so this is a very solid result that we
[00:28:07] so this is a very solid result that we know from pedagogy and how the human
[00:28:09] know from pedagogy and how the human brain works oh so if you're able to
[00:28:11] brain works oh so if you're able to serve again the way I my life is my
[00:28:15] serve again the way I my life is my backpack I just always have a few papers
[00:28:16] backpack I just always have a few papers with me I'm gonna find that I can I read
[00:28:19] with me I'm gonna find that I can I read almost everything on the tablet on my
[00:28:22] almost everything on the tablet on my iPad but I find that research papers and
[00:28:24] iPad but I find that research papers and all those things where the ability to
[00:28:25] all those things where the ability to flip between pages and skim I still find
[00:28:28] flip between pages and skim I still find more efficient on paper so I read almost
[00:28:30] more efficient on paper so I read almost nothing on paper these days except for
[00:28:33] nothing on paper these days except for research papers but that's just me your
[00:28:34] research papers but that's just me your mileage may vary maybe something else
[00:28:36] mileage may vary maybe something else will work out for you okay all right um
[00:28:40] will work out for you okay all right um so let's see that's it for reading mr.
[00:28:42] so let's see that's it for reading mr. rivers I hope that while you're in CSU
[00:28:44] rivers I hope that while you're in CSU 30 you know but if some of you find some
[00:28:46] 30 you know but if some of you find some cool papers of you go further for the
[00:28:49] cool papers of you go further for the dense net paper and find an interesting
[00:28:50] dense net paper and find an interesting result there go ahead and post on Piazza
[00:28:53] result there go ahead and post on Piazza or if any of you want to saw the reading
[00:28:55] or if any of you want to saw the reading group of other friends here at Stanford
[00:28:57] group of other friends here at Stanford you know encourage you to look around
[00:28:59] you know encourage you to look around class fine fine fine a group here on
[00:29:02] class fine fine fine a group here on campus or with among your CSU 30 calls
[00:29:05] campus or with among your CSU 30 calls Basel
[00:29:06] Basel all of your work colleagues for those of
[00:29:09] all of your work colleagues for those of you taking this on SCPD so that you can
[00:29:11] you taking this on SCPD so that you can all keep you know studying so nutrition
[00:29:13] all keep you know studying so nutrition and learning them and then helping each
[00:29:15] and learning them and then helping each other alone okay so that's it for
[00:29:19] other alone okay so that's it for reading papers the second thing I want
[00:29:21] reading papers the second thing I want to do today is just give some
[00:29:22] to do today is just give some longer-term advice on navigating a
[00:29:24] longer-term advice on navigating a career in machine learning right any
[00:29:27] career in machine learning right any questions about this before I move on
[00:29:33] all right but I hope that was useful
[00:29:36] all right but I hope that was useful some of this I wish I had know when I
[00:29:38] some of this I wish I had know when I was a first year PhD student but oh all
[00:29:41] was a first year PhD student but oh all right um let's see can we turn on the
[00:29:45] right um let's see can we turn on the lights please
[00:29:46] lights please oh all right so kind of in response to
[00:29:52] oh all right so kind of in response to requests from earlier worse citizen
[00:29:55] requests from earlier worse citizen earlier versions of class before we you
[00:29:57] earlier versions of class before we you know as we approach the end of the
[00:29:58] know as we approach the end of the quarter want to give some advice to
[00:30:00] quarter want to give some advice to housing navigator career machine
[00:30:02] housing navigator career machine learning right so today machine learning
[00:30:04] learning right so today machine learning there are so many opportunities to do so
[00:30:06] there are so many opportunities to do so many exciting things so how do you you
[00:30:09] many exciting things so how do you you know what what do you want to do so I'm
[00:30:13] know what what do you want to do so I'm going to assume that most of you will
[00:30:17] going to assume that most of you will want to do one of two things right at
[00:30:20] want to do one of two things right at some point you know you want to get the
[00:30:25] some point you know you want to get the job right maybe a job that does work in
[00:30:28] job right maybe a job that does work in machine learning and including a faculty
[00:30:30] machine learning and including a faculty position for those who have been
[00:30:31] position for those who have been professor but I guess eventually most
[00:30:33] professor but I guess eventually most people end up with a job I think I guess
[00:30:35] people end up with a job I think I guess there other alternatives but oh but then
[00:30:38] there other alternatives but oh but then some of you want to go on to more
[00:30:39] some of you want to go on to more advanced graduate studies and although
[00:30:41] advanced graduate studies and although even after you get your PhD at some
[00:30:43] even after you get your PhD at some point most people do get a job after the
[00:30:45] point most people do get a job after the PhD and and and by job I mean either in
[00:30:49] PhD and and and by job I mean either in the big company you know Ora's or a
[00:30:53] the big company you know Ora's or a start-up right but regardless of the
[00:30:57] start-up right but regardless of the details of this I think I hope most of
[00:31:00] details of this I think I hope most of you want to do important work
[00:31:07] so um what I'd like to do today is break
[00:31:13] so um what I'd like to do today is break you know this into how do you find a job
[00:31:16] you know this into how do you find a job or join a ph.d program or whatever then
[00:31:19] or join a ph.d program or whatever then lets you do important work and I want to
[00:31:21] lets you do important work and I want to break this discussion into two steps um
[00:31:23] break this discussion into two steps um one is just you know how do you get a
[00:31:25] one is just you know how do you get a position right how do you get that job
[00:31:31] position right how do you get that job offer or how do you get that offer of
[00:31:34] offer or how do you get that offer of Admissions in the ph.d program or
[00:31:35] Admissions in the ph.d program or admission to the master's program well
[00:31:37] admission to the master's program well whatever you want to do and then to is a
[00:31:39] whatever you want to do and then to is a selecting a position right there between
[00:31:45] selecting a position right there between you know going to this university versus
[00:31:47] you know going to this university versus that university or between taking on the
[00:31:49] that university or between taking on the job and this company is that company
[00:31:50] job and this company is that company what are the ones that will tend to set
[00:31:53] what are the ones that will tend to set you up for success for long term
[00:31:54] you up for success for long term personal success and career success and
[00:31:57] personal success and career success and everything I hope that by the way I hope
[00:31:59] everything I hope that by the way I hope that all these are just tactics to let
[00:32:01] that all these are just tactics to let you do important work right I know this
[00:32:03] you do important work right I know this I hope that's what you want to do um so
[00:32:06] I hope that's what you want to do um so you know what the recruiters look for oh
[00:32:12] and I think just to keep the language
[00:32:14] and I think just to keep the language simpler I'm going to pretend that I'm
[00:32:17] simpler I'm going to pretend that I'm just gonna talk about finding a job and
[00:32:19] just gonna talk about finding a job and but a lot of the very simple things
[00:32:20] but a lot of the very simple things apply for PhD programs it's just instead
[00:32:22] apply for PhD programs it's just instead of saying recruiters I would say
[00:32:24] of saying recruiters I would say admissions committees right things
[00:32:25] admissions committees right things actually
[00:32:25] actually some of this is let me just focus on the
[00:32:28] some of this is let me just focus on the job scenario so most recruiters look for
[00:32:33] job scenario so most recruiters look for technical skills so for example there
[00:32:36] technical skills so for example there are a lot of machine learning interviews
[00:32:39] are a lot of machine learning interviews they'll ask you questions like you know
[00:32:41] they'll ask you questions like you know well would you use gradient descent or
[00:32:43] well would you use gradient descent or battery and descend the customer batches
[00:32:45] battery and descend the customer batches and what happens in the mean batch sizes
[00:32:46] and what happens in the mean batch sizes too large to small right so there are
[00:32:48] too large to small right so there are companies many companies today asking
[00:32:50] companies many companies today asking questions like that in the interview
[00:32:52] questions like that in the interview process or can you explain everything
[00:32:55] process or can you explain everything else here and that GRU and when would
[00:32:57] else here and that GRU and when would you use GRU and so you really get
[00:32:58] you use GRU and so you really get questions like that in many job
[00:33:00] questions like that in many job interviews today
[00:33:01] interviews today and so crus is looking for m/l skills as
[00:33:05] and so crus is looking for m/l skills as well as and so you often be quiz on ml
[00:33:10] well as and so you often be quiz on ml skills as well as your coding ability
[00:33:14] skills as well as your coding ability and then beyond your and I think Silicon
[00:33:18] and then beyond your and I think Silicon Valley has become quite good at giving
[00:33:19] Valley has become quite good at giving people the assessments
[00:33:21] people the assessments to test for real skill in machine
[00:33:22] to test for real skill in machine learning engineering and the software
[00:33:24] learning engineering and the software engineering um and then the other thing
[00:33:27] engineering um and then the other thing that recruiters will look for that many
[00:33:29] that recruiters will look for that many recruiters will look for is a meaningful
[00:33:31] recruiters will look for is a meaningful work and in particular you know there
[00:33:43] work and in particular you know there are some candidates that apply for jobs
[00:33:45] are some candidates that apply for jobs that have very young theoretical they're
[00:33:49] that have very young theoretical they're very academic skills meaning you can
[00:33:51] very academic skills meaning you can answer all the quiz questions about you
[00:33:53] answer all the quiz questions about you know what is Bosch knowing okay they're
[00:33:54] know what is Bosch knowing okay they're arguing design for this but unless
[00:33:56] arguing design for this but unless you've actually shown that you can apply
[00:33:58] you've actually shown that you can apply this to a meaningful setting it's harder
[00:34:01] this to a meaningful setting it's harder to convince a company or recruiter that
[00:34:03] to convince a company or recruiter that you know not just a theory but that you
[00:34:05] you know not just a theory but that you know how to actually make this stuff
[00:34:07] know how to actually make this stuff work and so having done meaningful work
[00:34:09] work and so having done meaningful work using machine learning is a very strong
[00:34:12] using machine learning is a very strong as a very desirable candidate I think to
[00:34:15] as a very desirable candidate I think to a lot of companies kind of work
[00:34:16] a lot of companies kind of work experience I think really whether you've
[00:34:19] experience I think really whether you've done whether you've done something
[00:34:20] done whether you've done something meaningful reassures that you know that
[00:34:23] meaningful reassures that you know that you can actually do work right it's not
[00:34:27] you can actually do work right it's not just economic quiz questions being
[00:34:28] just economic quiz questions being another implementer in the algorithms
[00:34:30] another implementer in the algorithms that work and and and maybe um yeah and
[00:34:38] that work and and and maybe um yeah and then many recruiters actually look for
[00:34:39] then many recruiters actually look for your policy to keep on learning new
[00:34:40] your policy to keep on learning new skills and stay on top of machine
[00:34:42] skills and stay on top of machine learning and se evolves as well and so a
[00:34:46] learning and se evolves as well and so a very common pattern for the successful
[00:34:50] very common pattern for the successful you know AI engineers a machine learning
[00:34:53] you know AI engineers a machine learning engineers would be the following where
[00:34:55] engineers would be the following where if you on the horizontal axis I plot
[00:34:58] if you on the horizontal axis I plot different areas so you might learned
[00:35:01] different areas so you might learned about machine learning learn about deep
[00:35:03] about machine learning learn about deep learning learn about probabilistic
[00:35:05] learning learn about probabilistic graphical models learn about NLP learn
[00:35:08] graphical models learn about NLP learn about computer vision and so on for
[00:35:11] about computer vision and so on for other areas of AI or machine learning
[00:35:12] other areas of AI or machine learning and at the vertical area if the vertical
[00:35:16] and at the vertical area if the vertical axis is death a lot of strongest
[00:35:21] axis is death a lot of strongest candidates or jobs are t-shaped
[00:35:23] candidates or jobs are t-shaped individuals meaning that you have a
[00:35:25] individuals meaning that you have a broad understanding of many different
[00:35:26] broad understanding of many different topics in AI machine
[00:35:28] topics in AI machine and very deep understanding in you know
[00:35:31] and very deep understanding in you know maybe at least one area maybe more than
[00:35:33] maybe at least one area maybe more than one area and so I think by taking sis to
[00:35:37] one area and so I think by taking sis to 30 and doing the things that are doing
[00:35:39] 30 and doing the things that are doing here hopefully you're harming a deeper
[00:35:41] here hopefully you're harming a deeper understanding of one of these areas of
[00:35:43] understanding of one of these areas of deep learning in particular but the
[00:35:46] deep learning in particular but the other thing that even you know deepens
[00:35:49] other thing that even you know deepens your knowledge in one area will be the
[00:35:51] your knowledge in one area will be the projects you work on the open source
[00:35:54] projects you work on the open source contributions you make right whether or
[00:35:58] contributions you make right whether or not you've done research and maybe
[00:36:01] not you've done research and maybe whether or not you've done an internship
[00:36:04] whether or not you've done an internship and I think these two elements you know
[00:36:08] and I think these two elements you know broad area of skills and then also going
[00:36:11] broad area of skills and then also going deeper to do a meaningful project and
[00:36:13] deeper to do a meaningful project and deep learning or work of a Stanford
[00:36:15] deep learning or work of a Stanford professor right and do a meaningful
[00:36:18] professor right and do a meaningful research projects or make some
[00:36:20] research projects or make some contributions the open source publishing
[00:36:21] contributions the open source publishing on github and then let's use it these
[00:36:23] on github and then let's use it these are the things that let you deepen your
[00:36:25] are the things that let you deepen your knowledge and convince recruiters that
[00:36:27] knowledge and convince recruiters that you both have the broad technical skills
[00:36:28] you both have the broad technical skills and when called on you're able to apply
[00:36:31] and when called on you're able to apply these in a meaningful way to an
[00:36:34] these in a meaningful way to an important problem right and in fact them
[00:36:37] important problem right and in fact them the way we design CS 230 is actually a
[00:36:39] the way we design CS 230 is actually a microcosm of this where you know you
[00:36:42] microcosm of this where you know you learned about neural nets learned about
[00:36:45] learned about neural nets learned about tapas at bash norm confidence sequence
[00:36:49] tapas at bash norm confidence sequence models write them to say your Arlen's so
[00:36:54] models write them to say your Arlen's so actually if a breath within the field of
[00:36:58] actually if a breath within the field of deep learning and then what happens is
[00:37:01] deep learning and then what happens is know then and the reason want you to
[00:37:02] know then and the reason want you to work on the project is so that you can
[00:37:04] work on the project is so that you can pick one of these areas and maybe go
[00:37:06] pick one of these areas and maybe go deep and build a meaningful project in
[00:37:10] deep and build a meaningful project in one of these areas which will much more
[00:37:13] one of these areas which will much more and it's not just about making a resume
[00:37:15] and it's not just about making a resume look good right it's about giving you
[00:37:16] look good right it's about giving you the practical experience to make sure
[00:37:18] the practical experience to make sure you actually know how to make these
[00:37:19] you actually know how to make these things work and give you the learnings
[00:37:22] things work and give you the learnings to make sure that you actually know how
[00:37:24] to make sure that you actually know how to make a CNN work on our network and
[00:37:26] to make a CNN work on our network and then constants the many students also
[00:37:28] then constants the many students also listed practice on the resumes obviously
[00:37:31] listed practice on the resumes obviously so I think the
[00:37:38] let's see the failure modes the things
[00:37:42] let's see the failure modes the things bad ways to navigate your career um
[00:37:45] bad ways to navigate your career um there are some students that just do
[00:37:47] there are some students that just do this right there are some time for
[00:37:49] this right there are some time for students that just take class off the
[00:37:51] students that just take class off the class off the cars off the cross and go
[00:37:53] class off the cars off the cross and go equally in depth in a huge range of
[00:37:56] equally in depth in a huge range of areas and this is not terrible you can
[00:37:58] areas and this is not terrible you can actually still get a job
[00:37:59] actually still get a job you still get sometimes you can even get
[00:38:02] you still get sometimes you can even get into some PhD programs like this without
[00:38:04] into some PhD programs like this without the Deaf but this is not the best way to
[00:38:06] the Deaf but this is not the best way to navigate your career right so there are
[00:38:08] navigate your career right so there are some Stanford's issues that take tons of
[00:38:10] some Stanford's issues that take tons of classes even get a good GPA doing that
[00:38:12] classes even get a good GPA doing that but do nothing else and this is not
[00:38:15] but do nothing else and this is not terrible but this is this is not this is
[00:38:17] terrible but this is this is not this is not great it's not as good as the
[00:38:19] not great it's not as good as the alternative um there's one other thing
[00:38:22] alternative um there's one other thing I've seen Stanford students do which is
[00:38:24] I've seen Stanford students do which is just try to do that
[00:38:26] just try to do that right but she's just try to jump in on
[00:38:29] right but she's just try to jump in on day one and go really really deep in one
[00:38:32] day one and go really really deep in one area and again this has its own
[00:38:36] area and again this has its own challenges
[00:38:36] challenges I guess you know one one one one failure
[00:38:39] I guess you know one one one one failure mode 1 mode there's actually not great
[00:38:41] mode 1 mode there's actually not great is sometimes where you get some
[00:38:43] is sometimes where you get some undergrad freshman at Stanford that have
[00:38:46] undergrad freshman at Stanford that have not yet learned a lot about coding or
[00:38:48] not yet learned a lot about coding or software engineering or machine learning
[00:38:49] software engineering or machine learning and try to jump into a research project
[00:38:51] and try to jump into a research project right away this turns out not be very
[00:38:53] right away this turns out not be very efficient because it turns out Stanford
[00:38:54] efficient because it turns out Stanford courses are you know online courses the
[00:38:56] courses are you know online courses the Stanford classes they're very efficient
[00:38:58] Stanford classes they're very efficient way for you to learn about the program
[00:38:59] way for you to learn about the program chip areas and after that going deeper
[00:39:02] chip areas and after that going deeper and getting experience in one vertical
[00:39:03] and getting experience in one vertical area then defense is knowledge make sure
[00:39:05] area then defense is knowledge make sure you know how to actually make those
[00:39:06] you know how to actually make those ideas work so I do see sometimes
[00:39:08] ideas work so I do see sometimes unfortunately you know send some time
[00:39:11] unfortunately you know send some time for freshmen join us already know how
[00:39:12] for freshmen join us already know how the code and have implemented you know
[00:39:14] the code and have implemented you know some learning Avram's but some students
[00:39:17] some learning Avram's but some students that do not yet have much experience try
[00:39:20] that do not yet have much experience try to jump in the research project right
[00:39:21] to jump in the research project right away and that turns out not to be very
[00:39:23] away and that turns out not to be very productive for the student or for the
[00:39:25] productive for the student or for the research group because until you've
[00:39:27] research group because until you've taken classes and master basics is
[00:39:28] taken classes and master basics is difficult to understand what's really
[00:39:30] difficult to understand what's really going on in the advanced projects right
[00:39:33] going on in the advanced projects right so I would I would say this is actually
[00:39:35] so I would I would say this is actually worse than that right this is this is
[00:39:38] worse than that right this is this is actually okay this is actually pretty
[00:39:40] actually okay this is actually pretty bad this is III would not do this for
[00:39:42] bad this is III would not do this for your career
[00:39:46] and then the other not so grateful that
[00:39:50] and then the other not so grateful that you see some sandwiches to do is get a
[00:39:54] you see some sandwiches to do is get a lot of breath and then do a tiny project
[00:39:57] lot of breath and then do a tiny project here and do a tiny project in there I do
[00:39:58] here and do a tiny project in there I do a tiny project there do a tiny project
[00:40:00] a tiny project there do a tiny project there and you end up with ten tiny
[00:40:02] there and you end up with ten tiny projects but know one or two really
[00:40:05] projects but know one or two really significant projects and again this is
[00:40:08] significant projects and again this is not terrible but you know beyond a
[00:40:11] not terrible but you know beyond a certain point by the way
[00:40:13] certain point by the way recruiters are not impressed by volume
[00:40:16] recruiters are not impressed by volume right so having done ten lane projects
[00:40:18] right so having done ten lane projects is actually not impressive not nearly as
[00:40:20] is actually not impressive not nearly as impressive as doing one great project or
[00:40:23] impressive as doing one great project or two great projects and again there's
[00:40:25] two great projects and again there's more to life than impressing recruiters
[00:40:27] more to life than impressing recruiters but recruits is very rational and the
[00:40:29] but recruits is very rational and the reason recruiters are less impressed by
[00:40:30] reason recruiters are less impressed by someone whose profile looks like this is
[00:40:32] someone whose profile looks like this is because they're actually probably
[00:40:34] because they're actually probably factually less skilled and less able and
[00:40:36] factually less skilled and less able and doing good work and machine learning
[00:40:38] doing good work and machine learning compared to someone that that has done a
[00:40:40] compared to someone that that has done a substantive project and knows what it
[00:40:42] substantive project and knows what it takes to see see the whole thing through
[00:40:44] takes to see see the whole thing through that make sense so when I say you know
[00:40:46] that make sense so when I say you know recruit is more or less empresas because
[00:40:48] recruit is more or less empresas because they're actually quite rational in terms
[00:40:50] they're actually quite rational in terms of trying to understand how good you are
[00:40:52] of trying to understand how good you are at that doing important work or building
[00:40:56] at that doing important work or building meaningful
[00:40:57] meaningful AR systems and so in terms of building
[00:41:02] AR systems and so in terms of building up of the horizontal piece and vertical
[00:41:03] up of the horizontal piece and vertical piece this is what I recommend to build
[00:41:07] piece this is what I recommend to build the horizontal piece lot of this is
[00:41:09] the horizontal piece lot of this is about building foundational skills and
[00:41:16] about building foundational skills and it turns out coursework is a very
[00:41:19] it turns out coursework is a very efficient way to do this you know in
[00:41:22] efficient way to do this you know in these courses right you know various
[00:41:24] these courses right you know various instructors like us but many other
[00:41:27] instructors like us but many other Stanford professors for the lot of work
[00:41:29] Stanford professors for the lot of work and to organizing the content to make it
[00:41:30] and to organizing the content to make it efficient for you to learn this material
[00:41:33] efficient for you to learn this material and then also reading research papers
[00:41:38] and then also reading research papers which we just talked about having a
[00:41:40] which we just talked about having a community will help you and then that is
[00:41:44] community will help you and then that is often building a more deep
[00:41:54] and relevant project and and and if the
[00:41:58] and relevant project and and and if the product projects had to be relevant so
[00:41:59] product projects had to be relevant so you know if you want to build a career
[00:42:01] you know if you want to build a career machine learning ability or in the eye
[00:42:03] machine learning ability or in the eye hopefully the project is something
[00:42:04] hopefully the project is something that's relevant to CSO machine learning
[00:42:06] that's relevant to CSO machine learning or AR deep learning
[00:42:07] or AR deep learning I do see I don't know for some reason I
[00:42:10] I do see I don't know for some reason I feel like a surprisingly large number of
[00:42:13] feel like a surprisingly large number of stem sins I know I understand that dance
[00:42:15] stem sins I know I understand that dance true and they spent a long time on that
[00:42:17] true and they spent a long time on that which is fine if you enjoy dancing go
[00:42:19] which is fine if you enjoy dancing go have fun
[00:42:20] have fun don't don't you know you don't need to
[00:42:22] don't don't you know you don't need to work all the time so going join the
[00:42:23] work all the time so going join the dance crew or go on the overseas
[00:42:26] dance crew or go on the overseas exchange program and go hang out in
[00:42:27] exchange program and go hang out in London and have fun but those things do
[00:42:30] London and have fun but those things do not ask directly contribute to this
[00:42:32] not ask directly contribute to this right yeah I know I think I think in an
[00:42:38] right yeah I know I think I think in an earlier version in this presentation you
[00:42:40] earlier version in this presentation you know students walked away saying huh you
[00:42:42] know students walked away saying huh you know Andrew says we should not have fun
[00:42:44] know Andrew says we should not have fun and work all the time and that's not to
[00:42:46] and work all the time and that's not to go
[00:42:53] um all right there is one all right um
[00:43:15] you know there is the Saturday morning
[00:43:22] you know there is the Saturday morning problem which all of you will face right
[00:43:26] problem which all of you will face right which is every week including this week
[00:43:29] which is every week including this week on Saturday morning you have a choice
[00:43:31] on Saturday morning you have a choice you can read the paper or work on
[00:43:44] you can read the paper or work on research or work on open source or I
[00:43:50] research or work on open source or I don't know what people do or you can
[00:43:51] don't know what people do or you can watch TV or something um and you will
[00:43:56] watch TV or something um and you will face this choice but maybe every
[00:43:58] face this choice but maybe every Saturday you know for the rest of your
[00:43:59] Saturday you know for the rest of your life or for Law Saturdays in the rest of
[00:44:01] life or for Law Saturdays in the rest of your life and um you know you can build
[00:44:04] your life and um you know you can build out that foundational skills go deep or
[00:44:07] out that foundational skills go deep or go have fun and you should have fun all
[00:44:09] go have fun and you should have fun all right it's for the record but one of the
[00:44:12] right it's for the record but one of the problems that a lot of people face is
[00:44:14] problems that a lot of people face is that even if you spend all Saturday and
[00:44:17] that even if you spend all Saturday and all Sunday reading a research paper you
[00:44:20] all Sunday reading a research paper you know the following Monday or maybe spent
[00:44:23] know the following Monday or maybe spent all Saturday and Sunday working hard it
[00:44:25] all Saturday and Sunday working hard it turns out that the following Monday
[00:44:27] turns out that the following Monday you're not that much better at deep
[00:44:29] you're not that much better at deep learning is that yeah you work really
[00:44:30] learning is that yeah you work really hard so you read five papers you know
[00:44:32] hard so you read five papers you know great but if you work of a research
[00:44:35] great but if you work of a research group the professor or you you know or
[00:44:37] group the professor or you you know or your manager if you're in the company
[00:44:38] your manager if you're in the company they have no idea how hard you work so
[00:44:41] they have no idea how hard you work so there's no one to come by and say oh
[00:44:42] there's no one to come by and say oh good job working so how long we can so
[00:44:44] good job working so how long we can so no one knows these sacrifices you may
[00:44:47] no one knows these sacrifices you may all weekend to study your code open
[00:44:49] all weekend to study your code open Sol's opposes this no one knows so
[00:44:51] Sol's opposes this no one knows so there's almost no short-term reward to
[00:44:53] there's almost no short-term reward to doing these things but the seat but and
[00:44:57] doing these things but the seat but and and where whereas they might be short
[00:44:59] and where whereas they might be short term rewards to doing other things right
[00:45:01] term rewards to doing other things right but the secret to this is
[00:45:04] but the secret to this is that is not about meeting papers really
[00:45:07] that is not about meeting papers really really hard for one Saturday morning or
[00:45:09] really hard for one Saturday morning or for all Saturday once and then being
[00:45:11] for all Saturday once and then being done the secret to this is to do this
[00:45:14] done the secret to this is to do this consistently you know for years or at
[00:45:17] consistently you know for years or at least four months and it turns out that
[00:45:18] least four months and it turns out that if you read um two papers a week and you
[00:45:22] if you read um two papers a week and you do that for a year then you have read 50
[00:45:25] do that for a year then you have read 50 papers after a year and you will be much
[00:45:27] papers after a year and you will be much better at delivering after that right I
[00:45:29] better at delivering after that right I mean when you really you're aware the
[00:45:32] mean when you really you're aware the hundred papers in the year will be two
[00:45:33] hundred papers in the year will be two papers a week and and so is so for you
[00:45:37] papers a week and and so is so for you to be successful it's much less about
[00:45:38] to be successful it's much less about the intense burst of effort you put in
[00:45:41] the intense burst of effort you put in over one weekend it's much more about
[00:45:43] over one weekend it's much more about whether you can find a little bit of
[00:45:45] whether you can find a little bit of time every week to read a few papers or
[00:45:47] time every week to read a few papers or contribute to open source or take some
[00:45:49] contribute to open source or take some online courses but and if you do that
[00:45:52] online courses but and if you do that you know every week for six months or do
[00:45:54] you know every week for six months or do that every week for a year you will
[00:45:56] that every week for a year you will actually learn a lot about these fields
[00:45:58] actually learn a lot about these fields and be much better off and be much more
[00:46:00] and be much better off and be much more capable at deep learning or machine
[00:46:02] capable at deep learning or machine learning or whatever so yeah nobody yeah
[00:46:09] learning or whatever so yeah nobody yeah actually my wife and I actually do not
[00:46:12] actually my wife and I actually do not own the TV for what I saw right but
[00:46:14] own the TV for what I saw right but again if you if you oh my go ahead this
[00:46:16] again if you if you oh my go ahead this is a make sure you don't don't don't
[00:46:19] is a make sure you don't don't don't don't drive yourself crazy and and the
[00:46:21] don't drive yourself crazy and and the healthy work-life integration as well
[00:46:27] all right so um so I hope that doing
[00:46:33] all right so um so I hope that doing these things
[00:46:34] these things whoa it's not about finding a job is
[00:46:36] whoa it's not about finding a job is about doing these things to make you
[00:46:38] about doing these things to make you more capable as a machine learning
[00:46:39] more capable as a machine learning person so that you have the power to God
[00:46:42] person so that you have the power to God and implement stuff that matters and to
[00:46:44] and implement stuff that matters and to do stuff the - you do do do work the
[00:46:46] do stuff the - you do do do work the matters well the second thing we like
[00:46:49] matters well the second thing we like chat about is selecting a job and
[00:46:52] chat about is selecting a job and there's actually interesting um I gave
[00:46:56] there's actually interesting um I gave this public presentation last year
[00:46:59] this public presentation last year sorry earlier this year and shortly
[00:47:02] sorry earlier this year and shortly after that presentation there was a
[00:47:05] after that presentation there was a student in the class there was already
[00:47:06] student in the class there was already in a company who emailed me saying boy
[00:47:09] in a company who emailed me saying boy Andrew I wish you had told me this
[00:47:10] Andrew I wish you had told me this before I said to my current job
[00:47:12] before I said to my current job so let's see let's see let's see
[00:47:16] so let's see let's see let's see hopefully this is be useful to you um so
[00:47:19] hopefully this is be useful to you um so it turns out that you know I so when
[00:47:26] it turns out that you know I so when you're at some point you're on you be
[00:47:27] you're at some point you're on you be deciding you know what peeps apparently
[00:47:28] deciding you know what peeps apparently wanna apply for what companies do want
[00:47:30] wanna apply for what companies do want higher job ads and I can tell you what
[00:47:38] so if you want to keep learning new
[00:47:40] so if you want to keep learning new things I think one of the biggest
[00:47:43] things I think one of the biggest predictors of your success will be
[00:47:45] predictors of your success will be whether or not you're working with great
[00:47:47] whether or not you're working with great people and projects right and in
[00:47:56] people and projects right and in particular you know there are these
[00:47:58] particular you know there are these fascinating results from whether I think
[00:48:01] fascinating results from whether I think I want to say from the social sciences
[00:48:02] I want to say from the social sciences showing that if your closest friends if
[00:48:06] showing that if your closest friends if your five closest friends retain closer
[00:48:08] your five closest friends retain closer friends are all smokers there's a much
[00:48:10] friends are all smokers there's a much higher chance you become a smoker as
[00:48:11] higher chance you become a smoker as well right and if you're five or ten
[00:48:13] well right and if you're five or ten close friends are you know overweight
[00:48:16] close friends are you know overweight there's much higher chance you do the
[00:48:18] there's much higher chance you do the same or and conversely there's a you
[00:48:21] same or and conversely there's a you know so I think that if your five
[00:48:23] know so I think that if your five closest friends work really hard really
[00:48:25] closest friends work really hard really long research papers care about the work
[00:48:27] long research papers care about the work right learning and making themselves
[00:48:29] right learning and making themselves better then there's actually very good
[00:48:30] better then there's actually very good chance that you will be that they'll
[00:48:32] chance that you will be that they'll influence you that way as well so we're
[00:48:34] influence you that way as well so we're all human we all influenced by the
[00:48:36] all human we all influenced by the people around us right and so um I think
[00:48:40] people around us right and so um I think that and I've been fortunate I've told
[00:48:42] that and I've been fortunate I've told the Stanford for a long time now is I've
[00:48:44] the Stanford for a long time now is I've been fortunate to have seen a lot of
[00:48:46] been fortunate to have seen a lot of students from go from Stanford to
[00:48:48] students from go from Stanford to various careers and because I've seen
[00:48:50] various careers and because I've seen how many hundreds or maybe low thousands
[00:48:53] how many hundreds or maybe low thousands understand the students that I knew
[00:48:54] understand the students that I knew right when there are so stem forces go
[00:48:56] right when there are so stem forces go on to a separate job I saw many of them
[00:48:58] on to a separate job I saw many of them have amazing careers I saw you know if
[00:49:01] have amazing careers I saw you know if you have like like okay careers but I
[00:49:04] you have like like okay careers but I think over time I've learned to patent
[00:49:06] think over time I've learned to patent match what is predictive of your future
[00:49:09] match what is predictive of your future success after you leave Stanford only
[00:49:11] success after you leave Stanford only share view some of those paths share
[00:49:12] share view some of those paths share view some of those patterns as you
[00:49:13] view some of those patterns as you navigate your career and and and it's
[00:49:16] navigate your career and and and it's just a so many options in machine
[00:49:17] just a so many options in machine learning today it's kind of tragic if
[00:49:19] learning today it's kind of tragic if you don't you know navigate to hopefully
[00:49:21] you don't you know navigate to hopefully maximized
[00:49:22] maximized one of the people that gets to do fun
[00:49:25] one of the people that gets to do fun and important work that helps helps
[00:49:26] and important work that helps helps others so when selecting a position I
[00:49:32] others so when selecting a position I would advise you to focus on the team
[00:49:43] you interact with and by team I mean you
[00:49:46] you interact with and by team I mean you know somewhere between ten to thirty
[00:49:49] know somewhere between ten to thirty persons right maybe up to fifty because
[00:49:53] persons right maybe up to fifty because it turns out that you if you there will
[00:49:57] it turns out that you if you there will be some group of people maybe ten to
[00:49:59] be some group of people maybe ten to thirty people maybe fifty people that
[00:50:01] thirty people maybe fifty people that you interact with quite closely and
[00:50:02] you interact with quite closely and these will be appears in the people that
[00:50:05] these will be appears in the people that that will influence you the most right
[00:50:07] that will influence you the most right because if you join a company with
[00:50:10] because if you join a company with 10,000 people you will not interact with
[00:50:12] 10,000 people you will not interact with all 10,000 people there will be a corps
[00:50:14] all 10,000 people there will be a corps of 10 or 30 or 50 people that you
[00:50:16] of 10 or 30 or 50 people that you interact with the most and is those
[00:50:19] interact with the most and is those people how much they know how much in
[00:50:20] people how much they know how much in teach you how hard-working they are
[00:50:22] teach you how hard-working they are whether they are learning themselves
[00:50:23] whether they are learning themselves that were influenced you the most rather
[00:50:25] that were influenced you the most rather than all these other hypothetical 10,000
[00:50:28] than all these other hypothetical 10,000 people in a giant company and of these
[00:50:31] people in a giant company and of these people one of the ones that will
[00:50:33] people one of the ones that will influence you the most is your manager
[00:50:35] influence you the most is your manager alright so make sure you meet your
[00:50:37] alright so make sure you meet your manager and get to know them and make
[00:50:38] manager and get to know them and make sure there's someone you want to work
[00:50:40] sure there's someone you want to work with and in particular I wouldn't
[00:50:43] with and in particular I wouldn't recommend focusing on these things and
[00:50:45] recommend focusing on these things and not on the brand of the company because
[00:50:54] not on the brand of the company because it turns out that the brand of the
[00:50:56] it turns out that the brand of the company you work with is actually not
[00:50:58] company you work with is actually not that correlated you know maybe there's a
[00:51:00] that correlated you know maybe there's a very recall relation but it's actually
[00:51:02] very recall relation but it's actually not that correlated with what your
[00:51:03] not that correlated with what your personal experience will be like
[00:51:05] personal experience will be like right and so
[00:51:14] and and by the way and getting just new
[00:51:16] and and by the way and getting just new full disclosure I'm one that you know I
[00:51:19] full disclosure I'm one that you know I have a research group here at Stanford
[00:51:20] have a research group here at Stanford right my research career Stanford is one
[00:51:22] right my research career Stanford is one of the better-known researchers in the
[00:51:24] of the better-known researchers in the world but just don't join us because you
[00:51:26] world but just don't join us because you think we're well-known right is this
[00:51:28] think we're well-known right is this just not a good reason to join us for
[00:51:29] just not a good reason to join us for the brand instead you only work with
[00:51:31] the brand instead you only work with someone meet the people and evaluate the
[00:51:33] someone meet the people and evaluate the individuals or look at the people and
[00:51:35] individuals or look at the people and see if you you think these are people
[00:51:37] see if you you think these are people you can learn from a worker better good
[00:51:39] you can learn from a worker better good people so um so in today's world there
[00:51:59] people so um so in today's world there are a lot of companies recruiting
[00:52:02] are a lot of companies recruiting Stanford students so let me give you
[00:52:04] Stanford students so let me give you some advice and did this piece of my
[00:52:07] some advice and did this piece of my good because many is well I'll just give
[00:52:09] good because many is well I'll just give the advice so sometimes there are giant
[00:52:13] the advice so sometimes there are giant companies with let's say um you know
[00:52:16] companies with let's say um you know fifty thousand people right and I'm not
[00:52:18] fifty thousand people right and I'm not thinking of any one specific company if
[00:52:20] thinking of any one specific company if you're trying to guess what content
[00:52:21] you're trying to guess what content think of there's no one special company
[00:52:22] think of there's no one special company I'm thinking of but this pattern matches
[00:52:24] I'm thinking of but this pattern matches to many large companies but maybe
[00:52:27] to many large companies but maybe there's a giant company with you know
[00:52:28] there's a giant company with you know fifty thousand people right and let's
[00:52:34] fifty thousand people right and let's say that they have a 300 person right I
[00:52:41] say that they have a 300 person right I team it turns out that if you look at
[00:52:46] team it turns out that if you look at the work of a few hundred presently I
[00:52:47] the work of a few hundred presently I team and if they send you a job offer to
[00:52:50] team and if they send you a job offer to join the 300 person the I team that
[00:52:52] join the 300 person the I team that might be pretty good right this may be
[00:52:54] might be pretty good right this may be the group you know who's working here
[00:52:56] the group you know who's working here about the potion's papers you read on
[00:52:58] about the potion's papers you read on news and so if you get a job offer to
[00:53:00] news and so if you get a job offer to work with this group that might be
[00:53:02] work with this group that might be pretty good or even better would be
[00:53:04] pretty good or even better would be sometimes even within the thirty person
[00:53:06] sometimes even within the thirty person the i-team is actually difficult to tell
[00:53:08] the i-team is actually difficult to tell what's good and was not there's often a
[00:53:10] what's good and was not there's often a lot of areas even with this one's even
[00:53:12] lot of areas even with this one's even better would be if you get a job offer
[00:53:15] better would be if you get a job offer to join the 30 person team so you
[00:53:18] to join the 30 person team so you actually know who's your manager who
[00:53:20] actually know who's your manager who your peers who you're working with and
[00:53:21] your peers who you're working with and if you think these are thirty great
[00:53:22] if you think these are thirty great people learn for
[00:53:23] people learn for that could be a great job offer the
[00:53:27] that could be a great job offer the failure mode that unfortunately I've
[00:53:29] failure mode that unfortunately I've seen several Stanford students go down
[00:53:32] seen several Stanford students go down or it's actually this is true sorry
[00:53:34] or it's actually this is true sorry there was one several years ago as a
[00:53:35] there was one several years ago as a Stanford student on you that I thought
[00:53:37] Stanford student on you that I thought was a great guy right you know I I knew
[00:53:39] was a great guy right you know I I knew his work he was coding machine learning
[00:53:40] his work he was coding machine learning algorithms I thought he was very sharp
[00:53:42] algorithms I thought he was very sharp and did very good work working with some
[00:53:44] and did very good work working with some of my PhD students he got a job offer
[00:53:47] of my PhD students he got a job offer from one of these giant companies with
[00:53:50] from one of these giant companies with that has a great AI group and his alpha
[00:53:53] that has a great AI group and his alpha wasn't to go to the AI group his offer
[00:53:55] wasn't to go to the AI group his offer was to join us and they will assign you
[00:53:58] was to join us and they will assign you to a team so this book was student that
[00:54:01] to a team so this book was student that was the sample student I know about they
[00:54:03] was the sample student I know about they care about he wound up being assigned to
[00:54:06] care about he wound up being assigned to really um boring Java back-end payments
[00:54:09] really um boring Java back-end payments team and so often you accept a job offer
[00:54:13] team and so often you accept a job offer he wound up being assigned to a you know
[00:54:15] he wound up being assigned to a you know back in and apologizing you work on Java
[00:54:17] back in and apologizing you work on Java back in payment processing systems I
[00:54:19] back in payment processing systems I think that great but the student was
[00:54:21] think that great but the student was assigned to that team and he was really
[00:54:23] assigned to that team and he was really bored and so um I think that this was a
[00:54:26] bored and so um I think that this was a student whose career I Percy saw his
[00:54:29] student whose career I Percy saw his career rising while he was at Stanford
[00:54:31] career rising while he was at Stanford and after he went to this you know
[00:54:34] and after he went to this you know frankly not very interesting team I saw
[00:54:36] frankly not very interesting team I saw his career plateau and after about a
[00:54:38] his career plateau and after about a year and a half he resigned from this
[00:54:40] year and a half he resigned from this company after wasting a year and a half
[00:54:42] company after wasting a year and a half of his life and missing no really on a
[00:54:44] of his life and missing no really on a year and a half of this very exciting
[00:54:45] year and a half of this very exciting growth of AI machine learning right so
[00:54:48] growth of AI machine learning right so it was very unfortunate and and it was
[00:54:51] it was very unfortunate and and it was actually after I told this story um lost
[00:54:54] actually after I told this story um lost time I told this class earlier this year
[00:54:55] time I told this class earlier this year that actually someone from acquisition
[00:54:59] that actually someone from acquisition from the same big company he found me
[00:55:02] from the same big company he found me and said boy anchor I wish she told me
[00:55:04] and said boy anchor I wish she told me the story earlier because exactly what
[00:55:06] the story earlier because exactly what happened to me at the same big company
[00:55:08] happened to me at the same big company oh no I want to share view a different
[00:55:17] so so I would just be careful about
[00:55:20] so so I would just be careful about rotation programs as well you know when
[00:55:23] rotation programs as well you know when the company is trying to recruit you
[00:55:24] the company is trying to recruit you if a company refuses to tell you what
[00:55:26] if a company refuses to tell you what project you work on who is your manager
[00:55:28] project you work on who is your manager exactly what's your joining i / c do not
[00:55:30] exactly what's your joining i / c do not find those job offers data track
[00:55:33] find those job offers data track because if they can't you know if it
[00:55:37] because if they can't you know if it refused to tell you what team you're
[00:55:38] refused to tell you what team you're gonna work with well chances are right
[00:55:40] gonna work with well chances are right telling you the answer will not make the
[00:55:42] telling you the answer will not make the job attractive to you that's why they're
[00:55:44] job attractive to you that's why they're not telling you so I just be very
[00:55:45] not telling you so I just be very careful and sometimes rotation programs
[00:55:48] careful and sometimes rotation programs sound good on paper but it's really you
[00:55:50] sound good on paper but it's really you know well we'll figure out where to send
[00:55:52] know well we'll figure out where to send you later so I feel like I've seen some
[00:55:54] you later so I feel like I've seen some students go into rotation programs that
[00:55:57] students go into rotation programs that sound good on paper that sound like a
[00:55:59] sound good on paper that sound like a good idea but just as you wouldn't after
[00:56:01] good idea but just as you wouldn't after you graduate Stanford what you want to
[00:56:02] you graduate Stanford what you want to do for internships and then apply for a
[00:56:04] do for internships and then apply for a job that would be a weird thing to do
[00:56:06] job that would be a weird thing to do so sometimes rotation firms out yeah
[00:56:08] so sometimes rotation firms out yeah come and do for internships and then
[00:56:09] come and do for internships and then we'll let you apply for a job and see
[00:56:10] we'll let you apply for a job and see where when I send you and they cook your
[00:56:12] where when I send you and they cook your java back and payment processing system
[00:56:13] java back and payment processing system right so so so just just be cautious
[00:56:17] right so so so just just be cautious about the marketing of rotation programs
[00:56:19] about the marketing of rotation programs oh and again if you do if but if there's
[00:56:23] oh and again if you do if but if there's but but if they what they say is do
[00:56:24] but but if they what they say is do rotation and then you join this team
[00:56:26] rotation and then you join this team then you can look at this student and
[00:56:28] then you can look at this student and say yep that's a great team I want to do
[00:56:30] say yep that's a great team I want to do a rotation but then I would go and work
[00:56:32] a rotation but then I would go and work with this human and these are the 30
[00:56:34] with this human and these are the 30 people I work with so that could be
[00:56:35] people I work with so that could be great but through a rotation then it
[00:56:37] great but through a rotation then it could send you anywhere in this giant
[00:56:38] could send you anywhere in this giant company that I would just be very
[00:56:39] company that I would just be very careful at all um now on the flip side
[00:56:43] careful at all um now on the flip side there are some companies I'm not gonna
[00:56:46] there are some companies I'm not gonna mention any companies but there are some
[00:56:48] mention any companies but there are some companies with you know are not as
[00:56:49] companies with you know are not as glamorous not as not as right cool
[00:56:51] glamorous not as not as right cool brands um and maybe this is a only
[00:56:55] brands um and maybe this is a only 10,000 person company or a 1,000 or
[00:56:58] 10,000 person company or a 1,000 or 50,000 per second or whatever this is
[00:56:59] 50,000 per second or whatever this is 10,000 per copy
[00:57:00] 10,000 per copy I have seen many companies that are not
[00:57:03] I have seen many companies that are not super well-known in the AI world and not
[00:57:06] super well-known in the AI world and not in the news all the time but they may
[00:57:08] in the news all the time but they may have a very elite team of a hundred
[00:57:11] have a very elite team of a hundred people doing great work on machine
[00:57:14] people doing great work on machine learning right and there are definitely
[00:57:15] learning right and there are definitely companies whose brands are not you know
[00:57:18] companies whose brands are not you know the first companies you think of when
[00:57:20] the first companies you think of when you think of great companies that
[00:57:22] you think of great companies that sometimes have a really really great ten
[00:57:25] sometimes have a really really great ten person or fifty percent of 100 person
[00:57:27] person or fifty percent of 100 person team that were all Sun during the
[00:57:29] team that were all Sun during the algorithms and even if the overall brand
[00:57:32] algorithms and even if the overall brand of the overall company you know isn't as
[00:57:34] of the overall company you know isn't as like there's a little bit sucky if you
[00:57:37] like there's a little bit sucky if you manage to track down this team and if
[00:57:39] manage to track down this team and if you have a job offer to join this elite
[00:57:42] you have a job offer to join this elite team in a much bigger company you could
[00:57:44] team in a much bigger company you could actually learn a lot from these people
[00:57:45] actually learn a lot from these people and do important work you know one of
[00:57:48] and do important work you know one of the things about Silicon Valley is that
[00:57:49] the things about Silicon Valley is that um the brand on your resume matters less
[00:57:53] um the brand on your resume matters less and less and less than ever before I
[00:57:55] and less and less than ever before I mean I guess I think the exceptions of
[00:57:58] mean I guess I think the exceptions of Stanford Brown you totally won the
[00:57:59] Stanford Brown you totally won the Stanford Brown the resume but with that
[00:58:01] Stanford Brown the resume but with that exception by really your silk'n values
[00:58:03] exception by really your silk'n values become really good sorry the world right
[00:58:05] become really good sorry the world right has become really good at evaluating
[00:58:06] has become really good at evaluating people for your genuine technical
[00:58:08] people for your genuine technical schools in your genuine capability and
[00:58:10] schools in your genuine capability and less for your brand and so I would
[00:58:13] less for your brand and so I would recommend that instead of trying to get
[00:58:15] recommend that instead of trying to get like the best stamps of approval on your
[00:58:17] like the best stamps of approval on your resume to go ahead and take the
[00:58:19] resume to go ahead and take the positions that let you have the best
[00:58:21] positions that let you have the best learning experiences and also allows you
[00:58:23] learning experiences and also allows you to do the most important work and that
[00:58:24] to do the most important work and that is really shaped by the you know 30 or
[00:58:27] is really shaped by the you know 30 or 50 people you work with and not by the
[00:58:30] 50 people you work with and not by the overall brand of the company you work
[00:58:32] overall brand of the company you work with right so the variance I cross so
[00:58:36] with right so the variance I cross so there's a huge variance across teams
[00:58:38] there's a huge variance across teams within one company and that variance is
[00:58:41] within one company and that variance is actually pretty bigger or might be
[00:58:43] actually pretty bigger or might be bigger than the variance across
[00:58:44] bigger than the variance across different companies sense so solid and
[00:58:46] different companies sense so solid and and if a company refuses to tell you
[00:58:49] and if a company refuses to tell you what team you were joined I would
[00:58:50] what team you were joined I would seriously consider just you know doing
[00:58:52] seriously consider just you know doing something well if you have a better
[00:58:53] something well if you have a better option I would I would do something else
[00:58:56] option I would I would do something else um and then finally yeah and and so
[00:59:01] um and then finally yeah and and so really again I guess I don't want to
[00:59:03] really again I guess I don't want to name these companies but you know think
[00:59:04] name these companies but you know think of some of the large retailers or
[00:59:06] of some of the large retailers or something large healthcare systems or
[00:59:08] something large healthcare systems or the lot of companies that are not
[00:59:11] the lot of companies that are not well-known in the AI world but that I've
[00:59:13] well-known in the AI world but that I've met there a I teams I think they're
[00:59:15] met there a I teams I think they're great
[00:59:15] great and so if you're able to find those jobs
[00:59:16] and so if you're able to find those jobs and meet their people you can actually
[00:59:18] and meet their people you can actually get very exciting jobs in there right
[00:59:20] get very exciting jobs in there right but of course for the giant companies
[00:59:22] but of course for the giant companies that the avai teams you can join that
[00:59:24] that the avai teams you can join that yielding AIT money that's also that's
[00:59:26] yielding AIT money that's also that's also great I'm a bit biased since I used
[00:59:28] also great I'm a bit biased since I used to leave somebody's eating your team so
[00:59:30] to leave somebody's eating your team so so I think those things are great but
[00:59:31] so I think those things are great but but but also some teams in not all right
[00:59:39] but but also some teams in not all right um lastly you know just general advice
[00:59:42] um lastly you know just general advice this is how I really live my life I tend
[00:59:46] this is how I really live my life I tend to choose two things to work on they'll
[00:59:49] to choose two things to work on they'll allow you to
[00:59:53] learn the most you know we try to do
[00:59:57] learn the most you know we try to do important work so you know especially if
[01:00:08] important work so you know especially if you're relatively early in career what
[01:00:10] you're relatively early in career what have you learned in your career will pay
[01:00:12] have you learned in your career will pay off for a long time and so and so
[01:00:17] off for a long time and so and so joining the teams and working with a
[01:00:19] joining the teams and working with a great set of 10 or 30 or 50 teammates
[01:00:22] great set of 10 or 30 or 50 teammates will let you learn a lot and then also
[01:00:25] will let you learn a lot and then also you know hopefully I mean yeah and and
[01:00:27] you know hopefully I mean yeah and and and just don't don't don't join a like a
[01:00:30] and just don't don't don't join a like a cigarette company and help you know give
[01:00:33] cigarette company and help you know give more people cancer or stuff like there
[01:00:35] more people cancer or stuff like there is this don't don't do this don't don't
[01:00:37] is this don't don't do this don't don't do stuff like that but if you can do
[01:00:39] do stuff like that but if you can do meaningful work that helps other people
[01:00:40] meaningful work that helps other people and do important work and also learn a
[01:00:43] and do important work and also learn a lot on the way hopefully you can find
[01:00:45] lot on the way hopefully you can find positions like that that lets you set
[01:00:49] positions like that that lets you set yourself up for long-term success but
[01:00:50] yourself up for long-term success but also do work that you think matters in
[01:00:52] also do work that you think matters in that and then helps other people
[01:00:54] that and then helps other people alright um any questions it was
[01:01:11] alright um any questions it was important you know yeah um I think one
[01:01:15] important you know yeah um I think one of the most meaningful things you do in
[01:01:16] of the most meaningful things you do in life is how people either advance the
[01:01:19] life is how people either advance the human condition or help other people but
[01:01:21] human condition or help other people but the thing is I'm nervous I don't a name
[01:01:23] the thing is I'm nervous I don't a name one or two things because the world
[01:01:25] one or two things because the world needs a lot of people who work on a lot
[01:01:26] needs a lot of people who work on a lot of different things so the world's not
[01:01:29] of different things so the world's not gonna function if everyone works on
[01:01:30] gonna function if everyone works on computational biology I think umpire is
[01:01:32] computational biology I think umpire is great but it's actually good that what
[01:01:35] great but it's actually good that what people work on compile my PhD students
[01:01:37] people work on compile my PhD students like you know mainly work on the outside
[01:01:40] like you know mainly work on the outside to healthcare my team at landing er does
[01:01:43] to healthcare my team at landing er does a lot of work on the outside
[01:01:43] a lot of work on the outside manufacturing agriculture to some
[01:01:46] manufacturing agriculture to some healthcare and some other industries I
[01:01:49] healthcare and some other industries I actually especially California fires
[01:01:51] actually especially California fires burning you know I actually think that
[01:01:54] burning you know I actually think that there's important work to be done in AI
[01:01:55] there's important work to be done in AI climate change but I think that there's
[01:01:59] climate change but I think that there's a lot of them important work a lot of
[01:02:01] a lot of them important work a lot of industries right so I actually think
[01:02:04] industries right so I actually think that you know I should think that the
[01:02:05] that you know I should think that the next wave of AI sees me I should say
[01:02:08] next wave of AI sees me I should say machine learning is we've we've already
[01:02:10] machine learning is we've we've already young transform a lot of the tech well
[01:02:13] young transform a lot of the tech well right so you know yeah I mean we've
[01:02:18] right so you know yeah I mean we've already helped a lot of the
[01:02:19] already helped a lot of the circumvallate tech world become good at
[01:02:22] circumvallate tech world become good at AI and that's big right how build a
[01:02:23] AI and that's big right how build a couple of the teams that wound up doing
[01:02:25] couple of the teams that wound up doing this right Google brain how Google
[01:02:27] this right Google brain how Google become cognitive learning the battery I
[01:02:29] become cognitive learning the battery I hope I do become you know couldn't one
[01:02:32] hope I do become you know couldn't one of the greatest companies in the world
[01:02:33] of the greatest companies in the world set in China and I'm very happy that
[01:02:37] set in China and I'm very happy that between me and so my friends in the
[01:02:39] between me and so my friends in the industry we've made a lot of good AI
[01:02:41] industry we've made a lot of good AI companies I think part of the next phase
[01:02:43] companies I think part of the next phase for the evolution of machine learning is
[01:02:46] for the evolution of machine learning is fair to go into not just to check
[01:02:48] fair to go into not just to check companies like you know like the Google
[01:02:51] companies like you know like the Google and Baidu which I hope this was you know
[01:02:52] and Baidu which I hope this was you know Facebook Microsoft which had nothing to
[01:02:54] Facebook Microsoft which had nothing to do as well as well that was a BMP
[01:02:57] do as well as well that was a BMP Pinterest ruber right all these like
[01:02:59] Pinterest ruber right all these like great companies that hope they'll all
[01:03:00] great companies that hope they'll all embrace a yard but I think some of the
[01:03:02] embrace a yard but I think some of the most exciting work to be done stores
[01:03:03] most exciting work to be done stores also look outside to check industry and
[01:03:06] also look outside to check industry and to look at all the sometimes calling
[01:03:08] to look at all the sometimes calling traditional industries that do not have
[01:03:10] traditional industries that do not have shiny tech things because I think the
[01:03:13] shiny tech things because I think the value creation there as surprised you
[01:03:15] value creation there as surprised you could implement there maybe even bigger
[01:03:18] could implement there maybe even bigger than if you you know yeah I mention one
[01:03:23] than if you you know yeah I mention one interesting thing one thing I notice is
[01:03:25] interesting thing one thing I notice is more than large tech companies all work
[01:03:26] more than large tech companies all work on the same problems right so everyone
[01:03:28] on the same problems right so everyone works a machine translation everyone
[01:03:30] works a machine translation everyone Russian speech recognition face
[01:03:32] Russian speech recognition face detection at quick through rate and
[01:03:33] detection at quick through rate and probably feels like this is great
[01:03:35] probably feels like this is great because it means there's a lot of
[01:03:36] because it means there's a lot of progress in machine translation and
[01:03:38] progress in machine translation and that's great we do want progress in
[01:03:40] that's great we do want progress in machine translation but sometimes we
[01:03:42] machine translation but sometimes we look at other industries so you know
[01:03:45] look at other industries so you know when you look at manufacturing or how
[01:03:48] when you look at manufacturing or how some of the medical devices things
[01:03:50] some of the medical devices things working ads or sometimes on these phones
[01:03:53] working ads or sometimes on these phones hang out with farmers on
[01:03:55] hang out with farmers on feel like in my own work my team's work
[01:03:58] feel like in my own work my team's work sometimes we're stumbling across
[01:04:00] sometimes we're stumbling across brand-new research problems that big
[01:04:03] brand-new research problems that big tech companies do not see and have not
[01:04:04] tech companies do not see and have not yet done to frame so I find one most in
[01:04:07] yet done to frame so I find one most in search of exciting challenges is
[01:04:08] search of exciting challenges is actually to be constantly on the cutting
[01:04:10] actually to be constantly on the cutting edge looking at these types of problems
[01:04:12] edge looking at these types of problems there's a different cutting edge than
[01:04:13] there's a different cutting edge than the cutting edge a big tech companies so
[01:04:15] the cutting edge a big tech companies so I think some of you are joining a big
[01:04:17] I think some of you are joining a big tech companies and that's great we need
[01:04:18] tech companies and that's great we need more AI the big companies and the tech
[01:04:20] more AI the big companies and the tech companies but I think a lot of the
[01:04:22] companies but I think a lot of the exciting work to do in AI is also
[01:04:24] exciting work to do in AI is also outside while we traditionally
[01:04:26] outside while we traditionally considered tech all right us 10 this
[01:04:31] considered tech all right us 10 this whole 50s so hope I hope this was
[01:04:34] whole 50s so hope I hope this was helpful and let's let's break for today
[01:04:37] helpful and let's let's break for today or have a have a great Thanksgiving
[01:04:39] or have a have a great Thanksgiving everyone and we'll see in a couple weeks


================================================================================
LECTURE 009
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 9 - Deep Reinforcement Learning

Source: https://www.youtube.com/watch?v=NP2XqpgTJyo

---

Transcript

[00:00:05] hi everyone and welcome to lecture nine
[00:00:08] hi everyone and welcome to lecture nine for CS 2:30 today we're going to discuss
[00:00:13] for CS 2:30 today we're going to discuss an advanced topic that will be kind of
[00:00:16] an advanced topic that will be kind of the marriage between deep learning and
[00:00:20] the marriage between deep learning and another field of AI which is
[00:00:23] another field of AI which is reinforcement learning and we will see a
[00:00:25] reinforcement learning and we will see a practical application and how deep
[00:00:29] practical application and how deep learning methods can be plugged in
[00:00:31] learning methods can be plugged in another family of algorithm so it's
[00:00:34] another family of algorithm so it's interesting because deep learning
[00:00:35] interesting because deep learning methods and deep inner networks have
[00:00:37] methods and deep inner networks have been shown to be very good function
[00:00:40] been shown to be very good function approximate errs essentially that's what
[00:00:42] approximate errs essentially that's what they are we're giving them data so that
[00:00:44] they are we're giving them data so that they can approximate a function there
[00:00:45] they can approximate a function there are a lot of different fields which
[00:00:48] are a lot of different fields which require these function approximate errs
[00:00:50] require these function approximate errs and deep learning methods can be plugged
[00:00:52] and deep learning methods can be plugged in all these methods this is one of
[00:00:55] in all these methods this is one of these examples so we'll first motivate
[00:00:58] these examples so we'll first motivate the setting of reinforcement learning
[00:01:01] the setting of reinforcement learning why do we need your enforcement learning
[00:01:04] why do we need your enforcement learning why cannot what why can't we use deep
[00:01:07] why cannot what why can't we use deep learning methods to solve everything
[00:01:09] learning methods to solve everything there is some set of methods that we
[00:01:12] there is some set of methods that we cannot solve with deep learning and
[00:01:14] cannot solve with deep learning and reinforcement learning that
[00:01:15] reinforcement learning that reinforcement learning applications are
[00:01:17] reinforcement learning applications are examples of that we will see an example
[00:01:21] examples of that we will see an example to introduce an algorithm of real for
[00:01:25] to introduce an algorithm of real for smuggling algorithm called cue learning
[00:01:27] smuggling algorithm called cue learning and we will add deep learning to this
[00:01:30] and we will add deep learning to this algorithm and make it deep cue learning
[00:01:33] algorithm and make it deep cue learning as we've seen with generative add
[00:01:37] as we've seen with generative add virtual networks and also deep neural
[00:01:38] virtual networks and also deep neural networks most models are hard to train
[00:01:42] networks most models are hard to train with how we had to come up with the
[00:01:44] with how we had to come up with the initialization we drop out with batch
[00:01:47] initialization we drop out with batch norm and mirrors mirrors of methods to
[00:01:51] norm and mirrors mirrors of methods to make this deep neural networks trained
[00:01:53] make this deep neural networks trained in gans we had to use methods as well in
[00:01:57] in gans we had to use methods as well in order to train guns and tricks and hacks
[00:01:59] order to train guns and tricks and hacks so here we will see some of the tips and
[00:02:02] so here we will see some of the tips and tricks to train deep cue learning which
[00:02:07] tricks to train deep cue learning which is a reinforcement learning algorithm
[00:02:09] is a reinforcement learning algorithm and at the end we will have a guest
[00:02:12] and at the end we will have a guest speaker coming to talk about advanced
[00:02:15] speaker coming to talk about advanced topics which are mostly research which
[00:02:18] topics which are mostly research which buying deep learning and reinforcement
[00:02:19] buying deep learning and reinforcement learning sounds good okay let's go so
[00:02:24] learning sounds good okay let's go so different force malaria is a very recent
[00:02:27] different force malaria is a very recent field I would say although both things
[00:02:29] field I would say although both things are our enforcement ring has existed for
[00:02:32] are our enforcement ring has existed for a long time only recently it's been
[00:02:35] a long time only recently it's been shown that using deep learning as a way
[00:02:38] shown that using deep learning as a way to approximate the functions that play a
[00:02:41] to approximate the functions that play a big role in reinforcement learning
[00:02:42] big role in reinforcement learning algorithms has worked a lot so one
[00:02:45] algorithms has worked a lot so one example is alphago and you probably all
[00:02:48] example is alphago and you probably all have heard of it it's google deepmind's
[00:02:50] have heard of it it's google deepmind's alphago has beaten world champions in a
[00:02:55] alphago has beaten world champions in a game called the game of Go which is a
[00:02:57] game called the game of Go which is a very very old strategy old game and the
[00:03:01] very very old strategy old game and the one on the right here or on your right
[00:03:05] one on the right here or on your right human level controlled through deep
[00:03:07] human level controlled through deep reinforcement learning is also a deep
[00:03:10] reinforcement learning is also a deep mind a google deepmind paper that came
[00:03:13] mind a google deepmind paper that came out and hit the headlines on the front
[00:03:14] out and hit the headlines on the front page of nature which is a one of the
[00:03:18] page of nature which is a one of the leading multi disciplinary peer-reviewed
[00:03:20] leading multi disciplinary peer-reviewed journals in the world and they've shown
[00:03:23] journals in the world and they've shown that with deep learning plugged in a
[00:03:27] that with deep learning plugged in a reinforcement learning setting they can
[00:03:29] reinforcement learning setting they can train an agent that beats human level in
[00:03:32] train an agent that beats human level in a variety of games and in fact these are
[00:03:36] a variety of games and in fact these are Atari games so they've shown actually
[00:03:39] Atari games so they've shown actually that their algorithm the same algorithm
[00:03:41] that their algorithm the same algorithm reproduced for a large number of games
[00:03:44] reproduced for a large number of games can beat humans on all of these games
[00:03:46] can beat humans on all of these games most of these games not all of these so
[00:03:50] most of these games not all of these so these are two examples although they use
[00:03:52] these are two examples although they use different sub techniques of
[00:03:54] different sub techniques of reinforcement learning they both include
[00:03:57] reinforcement learning they both include some deep learning aspect in it and
[00:03:58] some deep learning aspect in it and today we will mostly talk about the
[00:04:00] today we will mostly talk about the human level control through deep
[00:04:02] human level control through deep reinforcement learning also called deep
[00:04:04] reinforcement learning also called deep cue Network presented in this paper so
[00:04:08] cue Network presented in this paper so let's start with with motivating
[00:04:10] let's start with with motivating reinforcement learning using the the
[00:04:12] reinforcement learning using the the alphago setting this is a board of goal
[00:04:17] alphago setting this is a board of goal and the picture comes from deep mind
[00:04:19] and the picture comes from deep mind block so go you can think of it as a
[00:04:21] block so go you can think of it as a strategy game where you have a grid that
[00:04:24] strategy game where you have a grid that is up to 19 by 19 and you have two
[00:04:27] is up to 19 by 19 and you have two players one player has white stones and
[00:04:30] players one player has white stones and one player has black stones
[00:04:31] one player has black stones and at every step in the game you can
[00:04:33] and at every step in the game you can position a stone on the on the board on
[00:04:35] position a stone on the on the board on one of the grid cross the goal is to
[00:04:38] one of the grid cross the goal is to surround your opponent so to maximize
[00:04:41] surround your opponent so to maximize your territory by surrounding your
[00:04:44] your territory by surrounding your opponent and it's a very complex game
[00:04:46] opponent and it's a very complex game for different reasons
[00:04:47] for different reasons one reason is that you have to be you
[00:04:50] one reason is that you have to be you cannot be short-sighted in this game you
[00:04:52] cannot be short-sighted in this game you have to have a long term strategy and
[00:04:54] have to have a long term strategy and other reason is that the board is so big
[00:04:56] other reason is that the board is so big it's much bigger than a chess board
[00:04:58] it's much bigger than a chess board right chess board is 8 by 8 so let me
[00:05:03] right chess board is 8 by 8 so let me ask you a question if you had to solve
[00:05:05] ask you a question if you had to solve or build an agent that solves this game
[00:05:09] or build an agent that solves this game and beats humans or plays very well at
[00:05:11] and beats humans or plays very well at least with deep learning methods that
[00:05:14] least with deep learning methods that you've seen so far how would you do that
[00:05:33] someone wants to try so let's say you
[00:05:40] someone wants to try so let's say you have a you have to collect the data set
[00:05:42] have a you have to collect the data set because in classic supervised learning
[00:05:43] because in classic supervised learning we need a data set with X&amp;Y
[00:05:46] we need a data set with X&amp;Y what do you think would be your x and y
[00:05:51] yeah okay
[00:05:58] yeah okay input is game board and output is
[00:06:01] input is game board and output is probability of victory in that position
[00:06:03] probability of victory in that position so that's that's a good one I think
[00:06:05] so that's that's a good one I think input output what's the issue with that
[00:06:07] input output what's the issue with that one so yeah it's super hard to represent
[00:06:20] one so yeah it's super hard to represent what the probability of winning is from
[00:06:22] what the probability of winning is from this boy even like nobody can tell even
[00:06:25] this boy even like nobody can tell even if I ask an expert human to come and
[00:06:27] if I ask an expert human to come and tell us what's the probability of black
[00:06:29] tell us what's the probability of black winning in this or white winning in this
[00:06:31] winning in this or white winning in this setting they wouldn't be able to tell so
[00:06:35] setting they wouldn't be able to tell so this is a little more complicated any
[00:06:36] this is a little more complicated any other ideas of data sets yep
[00:06:42] other ideas of data sets yep okay good point so we could have the
[00:06:45] okay good point so we could have the grid like this one and then this is the
[00:06:48] grid like this one and then this is the input and the output would be the move
[00:06:50] input and the output would be the move the next action taken by probably a
[00:06:53] the next action taken by probably a professional player so we would just
[00:06:55] professional player so we would just watch professional players playing and
[00:06:57] watch professional players playing and we would record their moves and we would
[00:06:59] we would record their moves and we would build a data set of what is a
[00:07:01] build a data set of what is a professional move and we hope that our
[00:07:04] professional move and we hope that our network using this input-output will at
[00:07:07] network using this input-output will at some point learn how the professional
[00:07:09] some point learn how the professional players play and given an input state of
[00:07:11] players play and given an input state of the board we'll be able to decide of the
[00:07:14] the board we'll be able to decide of the next move what's the issue with that
[00:07:26] [Music]
[00:07:30] yes you need a whole lot of data why and
[00:07:34] yes you need a whole lot of data why and you said it you said because we need
[00:07:38] you said it you said because we need basically to represent all types of
[00:07:40] basically to represent all types of positions of the board all states so if
[00:07:43] positions of the board all states so if you were actually let's let's do that if
[00:07:45] you were actually let's let's do that if we were to compute the number of
[00:07:47] we were to compute the number of possible states of this board what would
[00:07:50] possible states of this board what would it be so 19 by 19 word
[00:08:06] remember what we did with adversarial
[00:08:08] remember what we did with adversarial examples we did it for pixel's right now
[00:08:11] examples we did it for pixel's right now we're doing it for a board so what's the
[00:08:15] we're doing it for a board so what's the question first is yes you want to try
[00:08:20] yeah 3 to the power 9 10 times 90 or 93
[00:08:26] yeah 3 to the power 9 10 times 90 or 93 yeah so why is it that is it yeah each
[00:08:36] yeah so why is it that is it yeah each spot and there are 19 times 19 spots can
[00:08:40] spot and there are 19 times 19 spots can have 3 state basically no stone white
[00:08:43] have 3 state basically no stone white stone or black stone so this is the all
[00:08:45] stone or black stone so this is the all possible state this is about 10 to the
[00:08:51] possible state this is about 10 to the 117 so it's super super big so we can
[00:08:56] 117 so it's super super big so we can probably not get even close to that by
[00:09:00] probably not get even close to that by observing professional players first
[00:09:01] observing professional players first because we don't have enough Pro
[00:09:03] because we don't have enough Pro shelters and because we're humans and we
[00:09:05] shelters and because we're humans and we don't have infinite life so the
[00:09:07] don't have infinite life so the professional players cannot play forever
[00:09:09] professional players cannot play forever they might get tired this way but so one
[00:09:12] they might get tired this way but so one issue is the state space is too big
[00:09:14] issue is the state space is too big another one is that the ground truth
[00:09:16] another one is that the ground truth probably would be wrong it's not because
[00:09:19] probably would be wrong it's not because you're a professional player that you
[00:09:20] you're a professional player that you will play the best move every time right
[00:09:22] will play the best move every time right every player has their own strategy so
[00:09:25] every player has their own strategy so the ground truth where we're having here
[00:09:27] the ground truth where we're having here is not necessarily true and our network
[00:09:30] is not necessarily true and our network might might not be able to beat these
[00:09:33] might might not be able to beat these human players what we're looking into
[00:09:35] human players what we're looking into here is an algorithm that beats humans
[00:09:37] here is an algorithm that beats humans okay second one to many states in the
[00:09:41] okay second one to many states in the game as you mentioned and third one we
[00:09:43] game as you mentioned and third one we will likely not generalize the reason we
[00:09:46] will likely not generalize the reason we will not generalize is because in
[00:09:47] will not generalize is because in classic supervised learning we're
[00:09:49] classic supervised learning we're looking for patterns if I ask you to
[00:09:50] looking for patterns if I ask you to build an algorithm to detect cats versus
[00:09:53] build an algorithm to detect cats versus dogs it will look for what the pattern
[00:09:54] dogs it will look for what the pattern of a cat is versus what the pattern of
[00:09:56] of a cat is versus what the pattern of the dog is in and the convolutional
[00:09:58] the dog is in and the convolutional filters we learn that in this case it's
[00:10:00] filters we learn that in this case it's about a strategy it's not about a
[00:10:02] about a strategy it's not about a pattern so you have to understand the
[00:10:04] pattern so you have to understand the process of winning this game in order to
[00:10:07] process of winning this game in order to make the next move you cannot generalize
[00:10:09] make the next move you cannot generalize if you don't understand this process of
[00:10:11] if you don't understand this process of long term strategy so we have to
[00:10:13] long term strategy so we have to incorporate that and that's where RL
[00:10:16] incorporate that and that's where RL comes into place
[00:10:18] comes into place RL is reinforcement learning a method
[00:10:21] RL is reinforcement learning a method that we could be described with one
[00:10:23] that we could be described with one sentence as automatically learning to
[00:10:26] sentence as automatically learning to make good sequences of decision so it's
[00:10:28] make good sequences of decision so it's about the long term it's not about the
[00:10:29] about the long term it's not about the shorter and we would use it generally
[00:10:32] shorter and we would use it generally when we have delayed labels like in this
[00:10:35] when we have delayed labels like in this game the label that you mentioned at the
[00:10:38] game the label that you mentioned at the beginning was probability of victory
[00:10:39] beginning was probability of victory this is a long term label we cannot get
[00:10:41] this is a long term label we cannot get this label now but over time the closer
[00:10:44] this label now but over time the closer we get to the end the better we have we
[00:10:47] we get to the end the better we have we are at seeing the victory or not and
[00:10:49] are at seeing the victory or not and it's for to make sequences of decision
[00:10:52] it's for to make sequences of decision so we make a move then the opponent
[00:10:53] so we make a move then the opponent makes a move then we make another move
[00:10:55] makes a move then we make another move and all the decisions of these move are
[00:10:57] and all the decisions of these move are correlated with each other like you have
[00:11:00] correlated with each other like you have to plan in advance when you're human you
[00:11:02] to plan in advance when you're human you do that when you play chess when you
[00:11:03] do that when you play chess when you play go so examples of RL applications
[00:11:06] play go so examples of RL applications can be robotics and it's still a
[00:11:08] can be robotics and it's still a research topic how deep RL can change
[00:11:11] research topic how deep RL can change robotics but thinking about having a
[00:11:13] robotics but thinking about having a robot walking from here and you want to
[00:11:15] robot walking from here and you want to send it there you want to send the robot
[00:11:17] send it there you want to send the robot there what you're teaching the robot is
[00:11:20] there what you're teaching the robot is if you get there it's good right it's
[00:11:22] if you get there it's good right it's good you achieve the task but I cannot
[00:11:25] good you achieve the task but I cannot give you the probability of getting
[00:11:26] give you the probability of getting there at every point I can help you out
[00:11:29] there at every point I can help you out by giving you a reward when you arrive
[00:11:31] by giving you a reward when you arrive there and let you trial and error so the
[00:11:33] there and let you trial and error so the robot will try and randomly initialize
[00:11:35] robot will try and randomly initialize the robot we just fall down at the first
[00:11:37] the robot we just fall down at the first at first gets a negative reward then
[00:11:40] at first gets a negative reward then repeats this time the robot knows that
[00:11:43] repeats this time the robot knows that it shouldn't fall down it shouldn't go
[00:11:44] it shouldn't fall down it shouldn't go down you should probably go this way so
[00:11:46] down you should probably go this way so true trial in there and reward on the
[00:11:49] true trial in there and reward on the long term the the robot is supposed to
[00:11:50] long term the the robot is supposed to learn this pattern
[00:11:52] learn this pattern another one is games and that's the one
[00:11:55] another one is games and that's the one we will see today games can be
[00:11:57] we will see today games can be represented as as a set of reward for
[00:12:00] represented as as a set of reward for reinforcement learning algorithm so this
[00:12:02] reinforcement learning algorithm so this is where you win this is where you lose
[00:12:04] is where you win this is where you lose let the algorithm play and figure out
[00:12:06] let the algorithm play and figure out what winning means and what losing means
[00:12:09] what winning means and what losing means until it learns okay the problem with
[00:12:12] until it learns okay the problem with using deep learning is that the
[00:12:13] using deep learning is that the algorithm will not learn because this
[00:12:15] algorithm will not learn because this reward is to long term so we're using
[00:12:17] reward is to long term so we're using reinforcement learning and finally
[00:12:19] reinforcement learning and finally advertisement so a lot of advertisements
[00:12:21] advertisement so a lot of advertisements are real time bidding so you want to
[00:12:24] are real time bidding so you want to know given a budget when you want to
[00:12:26] know given a budget when you want to invest this budget and this is a long
[00:12:28] invest this budget and this is a long term strategy planning as well that
[00:12:30] term strategy planning as well that reinforcement learning
[00:12:31] reinforcement learning can help with okay so this was the
[00:12:36] can help with okay so this was the motivation of reinforcement learning
[00:12:37] motivation of reinforcement learning we're going to jump to a concrete
[00:12:39] we're going to jump to a concrete example that is a super vanilla example
[00:12:41] example that is a super vanilla example to understand cue learning so let's
[00:12:44] to understand cue learning so let's start with this game or environment so
[00:12:47] start with this game or environment so we call that an environment generally
[00:12:48] we call that an environment generally and it has several states in this case
[00:12:51] and it has several states in this case five states so we have these states and
[00:12:54] five states so we have these states and we can define rewards which are the
[00:12:56] we can define rewards which are the following so let's see what is our goal
[00:12:58] following so let's see what is our goal in this game we define it as maximize
[00:13:01] in this game we define it as maximize the return or the reward on the
[00:13:03] the return or the reward on the long-term and what is the reward is the
[00:13:05] long-term and what is the reward is the numbers that you have here that were
[00:13:07] numbers that you have here that were defined by a human so this is where the
[00:13:09] defined by a human so this is where the human defines the reward now what's the
[00:13:13] human defines the reward now what's the game the game has five states state one
[00:13:15] game the game has five states state one is a trash can and has a reward of plus
[00:13:19] is a trash can and has a reward of plus two state two is a starting state
[00:13:22] two state two is a starting state initial state and we assumed that we
[00:13:24] initial state and we assumed that we would start in the initial state with
[00:13:26] would start in the initial state with the plastic bottle in our hand the goal
[00:13:28] the plastic bottle in our hand the goal will be to throw this plastic bottle in
[00:13:30] will be to throw this plastic bottle in a can if it's the trash can we get plus
[00:13:33] a can if it's the trash can we get plus two if we get to state five we get to
[00:13:36] two if we get to state five we get to the recycle bin and we can get plus ten
[00:13:39] the recycle bin and we can get plus ten super important application state four
[00:13:43] super important application state four has a chocolate so what happens is if
[00:13:47] has a chocolate so what happens is if you go to state four you get a reward of
[00:13:49] you go to state four you get a reward of one because you can eat the chocolate
[00:13:51] one because you can eat the chocolate and you can also through the the
[00:13:53] and you can also through the the chocolate in the in the in the recycle
[00:13:56] chocolate in the in the in the recycle bin hopefully that's the setting makes
[00:13:58] bin hopefully that's the setting makes sense so these states are of three types
[00:14:01] sense so these states are of three types one is the starting state initial which
[00:14:04] one is the starting state initial which is brown the normal state which is not
[00:14:09] is brown the normal state which is not starting neither neither starting nor an
[00:14:12] starting neither neither starting nor an ending state and it's gray and the blue
[00:14:16] ending state and it's gray and the blue states are terminal states so if we get
[00:14:18] states are terminal states so if we get to the terminal state we end up a game
[00:14:21] to the terminal state we end up a game or an episode let's say that's the
[00:14:24] or an episode let's say that's the setting make sense okay and here two
[00:14:27] setting make sense okay and here two possible actions you have to move either
[00:14:29] possible actions you have to move either you go on the left or you go on the
[00:14:31] you go on the left or you go on the right an additional rule will we'll add
[00:14:35] right an additional rule will we'll add is that the garbage collector will come
[00:14:36] is that the garbage collector will come in three minutes
[00:14:37] in three minutes and every step takes you one minute so
[00:14:40] and every step takes you one minute so you cannot spend more than three minutes
[00:14:42] you cannot spend more than three minutes in this game in other words you cannot
[00:14:43] in this game in other words you cannot stay at the chocolate and it
[00:14:45] stay at the chocolate and it chocolate forever you have to move at
[00:14:47] chocolate forever you have to move at some point okay so one question I have
[00:14:52] some point okay so one question I have is how do you define the long-term
[00:14:54] is how do you define the long-term return because we said we want a
[00:14:57] return because we said we want a long-term return we don't want we don't
[00:14:58] long-term return we don't want we don't care about short-term returns what do
[00:15:05] care about short-term returns what do you think is a good way to define a
[00:15:07] you think is a good way to define a long-term return here yeah the sum of
[00:15:12] long-term return here yeah the sum of the terminal states the sum of how many
[00:15:19] the terminal states the sum of how many points you have when you reach the
[00:15:20] points you have when you reach the terminal stage so let's say I'm in state
[00:15:23] terminal stage so let's say I'm in state 2 I have 0 reward right now if I reach
[00:15:27] 2 I have 0 reward right now if I reach the terminal state on the run on the on
[00:15:30] the terminal state on the run on the on your left the plus 2 I get plus 2 reward
[00:15:33] your left the plus 2 I get plus 2 reward and I finish the game if I go on the
[00:15:36] and I finish the game if I go on the right instead and I reach the plus 10
[00:15:39] right instead and I reach the plus 10 you're saying that the long-term return
[00:15:41] you're saying that the long-term return can be all the sum of the rewards I got
[00:15:43] can be all the sum of the rewards I got to get there so plus 11 so this is one
[00:15:46] to get there so plus 11 so this is one way to define the long term return any
[00:15:49] way to define the long term return any other ideas we probably want to
[00:16:05] other ideas we probably want to incorporate the time steps and reduce
[00:16:07] incorporate the time steps and reduce the reward as as time passes and in fact
[00:16:10] the reward as as time passes and in fact this would be called a discounted return
[00:16:12] this would be called a discounted return versus what you said would be called a
[00:16:14] versus what you said would be called a return here we use a discounted return
[00:16:19] return here we use a discounted return in and it has several advantages some
[00:16:21] in and it has several advantages some are mathematical because the return you
[00:16:24] are mathematical because the return you described which is not discounted might
[00:16:25] described which is not discounted might not converge it might go up to plus
[00:16:28] not converge it might go up to plus infinity this discounted return will
[00:16:31] infinity this discounted return will converge with an appropriate discount so
[00:16:35] converge with an appropriate discount so intuitively also why is the discounted
[00:16:39] intuitively also why is the discounted return intuitive is it's because time is
[00:16:42] return intuitive is it's because time is always an important factor in our
[00:16:44] always an important factor in our decision making people prefer cash now
[00:16:46] decision making people prefer cash now than cash in 10 years right or similarly
[00:16:49] than cash in 10 years right or similarly you can consider that the robot has a
[00:16:52] you can consider that the robot has a limited life expectancy like it has a
[00:16:54] limited life expectancy like it has a battery and loses battery every time it
[00:16:56] battery and loses battery every time it moves so you want to take into account
[00:16:58] moves so you want to take into account this disk
[00:16:58] this disk of if I can eat chocolates close I go
[00:17:03] of if I can eat chocolates close I go for it because I know that the chocolate
[00:17:05] for it because I know that the chocolate is too far I might not get there because
[00:17:07] is too far I might not get there because I'm losing some battery some energy for
[00:17:10] I'm losing some battery some energy for example so this is the discounted return
[00:17:12] example so this is the discounted return now if we take gamma equals 1 which
[00:17:16] now if we take gamma equals 1 which means we have no discounts the best
[00:17:18] means we have no discounts the best strategy to follow in this setting seems
[00:17:20] strategy to follow in this setting seems to be to go to the to the left to go to
[00:17:24] to be to go to the to the left to go to the right starting in the initial state
[00:17:26] the right starting in the initial state - right and the reason is it's a simple
[00:17:29] - right and the reason is it's a simple computation on one side I get plus 2 on
[00:17:31] computation on one side I get plus 2 on the other side I get plus 11 what if my
[00:17:34] the other side I get plus 11 what if my discount was point 1 which one will be
[00:17:40] discount was point 1 which one will be better
[00:17:43] yeah the left would be better directly
[00:17:46] yeah the left would be better directly to plus and the reason is because we
[00:17:48] to plus and the reason is because we compute in our mind we just do zero plus
[00:17:51] compute in our mind we just do zero plus 0.1 times Mount one which gives us 0.1
[00:17:55] 0.1 times Mount one which gives us 0.1 plus 0.1 squared times 10 and it's less
[00:17:59] plus 0.1 squared times 10 and it's less than two we know it okay so now we're
[00:18:04] than two we know it okay so now we're going to assume that the discount is 0.9
[00:18:06] going to assume that the discount is 0.9 and it's a very common discount to to to
[00:18:09] and it's a very common discount to to to to to use in reinforcement learning and
[00:18:11] to to use in reinforcement learning and we use a discount a traitor
[00:18:13] we use a discount a traitor so the general question here and it's
[00:18:16] so the general question here and it's the core of reinforcement learning in
[00:18:18] the core of reinforcement learning in this case of cue learning is what do we
[00:18:20] this case of cue learning is what do we want to learn and this is really really
[00:18:24] want to learn and this is really really think of it as a human what would you
[00:18:26] think of it as a human what would you like to learn what are the numbers you
[00:18:28] like to learn what are the numbers you need to have in order to be able to make
[00:18:30] need to have in order to be able to make decisions really quickly assuming you
[00:18:32] decisions really quickly assuming you had a lot more states than that in
[00:18:34] had a lot more states than that in actions
[00:18:38] any ideas of what we want to learn
[00:18:45] what would help our decision-making
[00:18:55] optimal action at each state yeah that's
[00:18:58] optimal action at each state yeah that's exactly what we want to learn for given
[00:19:00] exactly what we want to learn for given States tell me the action that I can
[00:19:02] States tell me the action that I can take and for that I need to have a score
[00:19:05] take and for that I need to have a score for all the actions in every state in
[00:19:07] for all the actions in every state in order to store these scores we need a
[00:19:08] order to store these scores we need a matrix right so this is our matrix we
[00:19:11] matrix right so this is our matrix we will call it a cue table it's going to
[00:19:13] will call it a cue table it's going to be of shape number of states times
[00:19:16] be of shape number of states times number of actions if I have these matrix
[00:19:19] number of actions if I have these matrix of scores and the scores are correct I'm
[00:19:22] of scores and the scores are correct I'm in state three I can look on the third
[00:19:25] in state three I can look on the third row of this matrix and look what's the
[00:19:28] row of this matrix and look what's the maximum value I have is it the first one
[00:19:30] maximum value I have is it the first one or the second one if it's the first one
[00:19:32] or the second one if it's the first one I go to the left if it's the second one
[00:19:35] I go to the left if it's the second one that is maximum I go to the right this
[00:19:37] that is maximum I go to the right this is what we would like to have does that
[00:19:40] is what we would like to have does that make sense
[00:19:41] make sense this Q table so now let's try to build
[00:19:45] this Q table so now let's try to build the Q table for this example if you have
[00:19:47] the Q table for this example if you have to build it you would first think of it
[00:19:50] to build it you would first think of it as a tree oh and by the way every entry
[00:19:52] as a tree oh and by the way every entry of this Q table tells you how good it is
[00:19:54] of this Q table tells you how good it is to take this action in that state state
[00:19:59] to take this action in that state state corresponding to the row action
[00:20:00] corresponding to the row action corresponding to the color so now how do
[00:20:03] corresponding to the color so now how do we get there we can build a tree and
[00:20:05] we get there we can build a tree and that's that's similar to what we would
[00:20:07] that's that's similar to what we would do in our mind we start in s2 in s2 we
[00:20:10] do in our mind we start in s2 in s2 we have two options either we go to s1 we
[00:20:13] have two options either we go to s1 we get to or we go to s3 and we get zero
[00:20:16] get to or we go to s3 and we get zero from s2 week from s1 we cannot go
[00:20:19] from s2 week from s1 we cannot go anywhere it's a terminal state but from
[00:20:20] anywhere it's a terminal state but from s3 we can go to s2 and get zero by going
[00:20:26] s3 we can go to s2 and get zero by going back or we can go to s4 and get one that
[00:20:30] back or we can go to s4 and get one that make sense from s4 same we can get zero
[00:20:33] make sense from s4 same we can get zero by going back to s3 or we can go to s5
[00:20:36] by going back to s3 or we can go to s5 and yet plus 10 now here I just have my
[00:20:39] and yet plus 10 now here I just have my immediate reward for every state what I
[00:20:41] immediate reward for every state what I would like to compute is the discounted
[00:20:43] would like to compute is the discounted returned for all the states because
[00:20:45] returned for all the states because ultimately what should lead my
[00:20:47] ultimately what should lead my decision-making in a state is if I take
[00:20:49] decision-making in a state is if I take this action I get two new States what's
[00:20:52] this action I get two new States what's the maximum reward I can get from there
[00:20:54] the maximum reward I can get from there in the future
[00:20:56] in the future not just the reward I get in that state
[00:20:58] not just the reward I get in that state if I take the other action I get to
[00:21:00] if I take the other action I get to another state what's the maximum reward
[00:21:03] another state what's the maximum reward I could get from that state not just the
[00:21:06] I could get from that state not just the immediate reward that I get from going
[00:21:07] immediate reward that I get from going to that state so what we would do we can
[00:21:10] to that state so what we would do we can do it together let's say we want to
[00:21:12] do it together let's say we want to compute the value of of the actions from
[00:21:15] compute the value of of the actions from s3 from s3 going right and left from s3
[00:21:18] s3 from s3 going right and left from s3 I can either go to s4 or s2 going to s4
[00:21:21] I can either go to s4 or s2 going to s4 I know that the immediate reward was 1
[00:21:24] I know that the immediate reward was 1 and I know that from s4 I can get +10
[00:21:27] and I know that from s4 I can get +10 this is a maximum I can get so I can
[00:21:29] this is a maximum I can get so I can discount this 10 multiplied by 0.9 10
[00:21:33] discount this 10 multiplied by 0.9 10 times 0.9 Jesus 9 plus 1 which was the
[00:21:36] times 0.9 Jesus 9 plus 1 which was the immediate reward is just 9 Jesus means
[00:21:39] immediate reward is just 9 Jesus means 10 so 10 is the score that we give to
[00:21:44] 10 so 10 is the score that we give to the action go right from state s3 now
[00:21:48] the action go right from state s3 now what if we do it from one step before s2
[00:21:52] what if we do it from one step before s2 from s2 I know that I can go to s3 + 2 s
[00:21:56] from s2 I know that I can go to s3 + 2 s 3 I get 0 reward so the immediate reward
[00:21:58] 3 I get 0 reward so the immediate reward is 0 but I know that from s3 I can get
[00:22:01] is 0 but I know that from s3 I can get 10 reward ultimately on the long-term I
[00:22:03] 10 reward ultimately on the long-term I need to discount this reward from one
[00:22:05] need to discount this reward from one step so I multiply this 10 by 0.9 and I
[00:22:08] step so I multiply this 10 by 0.9 and I get 0 plus 0.9 times 10 which gives me 9
[00:22:11] get 0 plus 0.9 times 10 which gives me 9 so now instead two going right will give
[00:22:14] so now instead two going right will give us a long-term reward of 9 make sense
[00:22:18] us a long-term reward of 9 make sense and you do the same thing you can copy
[00:22:22] and you do the same thing you can copy back that going from s4 to s3 will give
[00:22:24] back that going from s4 to s3 will give you 0 plus the maximum you can get from
[00:22:27] you 0 plus the maximum you can get from s3 which was 10 discounted by point 9 or
[00:22:30] s3 which was 10 discounted by point 9 or you can do it from s2 from s2 I can go
[00:22:33] you can do it from s2 from s2 I can go left and get +2 or I can go right and
[00:22:36] left and get +2 or I can go right and get 9 and the immediate reward would be
[00:22:39] get 9 and the immediate reward would be 9 would be 0 and I will discount the 9
[00:22:42] 9 would be 0 and I will discount the 9 by 0.9 and get 8.1 so that's the process
[00:22:45] by 0.9 and get 8.1 so that's the process we would do to compute that and you see
[00:22:47] we would do to compute that and you see that it's an iterative algorithm you
[00:22:50] that it's an iterative algorithm you will just copy back all these values in
[00:22:52] will just copy back all these values in my matrix and now if I'm in state 2 I
[00:22:55] my matrix and now if I'm in state 2 I can clearly say that the best action
[00:22:57] can clearly say that the best action seems to go seems to say go to the left
[00:23:00] seems to go seems to say go to the left because the long-term discounted reward
[00:23:04] because the long-term discounted reward is 9 while the long-term discounted
[00:23:07] is 9 while the long-term discounted reward for going to the right is 2
[00:23:09] reward for going to the right is 2 and I'm done that's to learning I solved
[00:23:13] and I'm done that's to learning I solved the problem I had I had a stay a problem
[00:23:16] the problem I had I had a stay a problem statement
[00:23:17] statement I found a matrix that tells me in every
[00:23:19] I found a matrix that tells me in every state what action I should take I'm fine
[00:23:22] state what action I should take I'm fine so why do we need deep learning it's a
[00:23:26] so why do we need deep learning it's a question we will try to answer so the
[00:23:29] question we will try to answer so the best strategy to follow with point nine
[00:23:31] best strategy to follow with point nine is still right right right and the way I
[00:23:34] is still right right right and the way I see it is I just look at my matrix at
[00:23:36] see it is I just look at my matrix at every step and I follow always the
[00:23:39] every step and I follow always the maximum of my row so from state to nine
[00:23:42] maximum of my row so from state to nine is the maximum so I go right from state
[00:23:45] is the maximum so I go right from state 310 is the maximum so I still go right
[00:23:47] 310 is the maximum so I still go right and from state 410 is the maximum so I
[00:23:49] and from state 410 is the maximum so I go right again so I take the maximum
[00:23:51] go right again so I take the maximum over all the actions in a specific state
[00:23:53] over all the actions in a specific state okay now one interesting thing to follow
[00:23:58] okay now one interesting thing to follow is that when you do this iterative
[00:24:00] is that when you do this iterative algorithm at some point it should
[00:24:01] algorithm at some point it should converge and ours converged to some
[00:24:04] converge and ours converged to some values that represent the discounted
[00:24:05] values that represent the discounted rewards for every state and action there
[00:24:10] rewards for every state and action there is an equation that this Q function
[00:24:13] is an equation that this Q function follows and we know that the optimal Q
[00:24:16] follows and we know that the optimal Q function followed this equation the one
[00:24:18] function followed this equation the one we have here follows this equation this
[00:24:21] we have here follows this equation this equation is called the bellman equation
[00:24:22] equation is called the bellman equation and it has two terms one is R and one is
[00:24:27] and it has two terms one is R and one is this count times the maximum of the Q
[00:24:31] this count times the maximum of the Q scores over all the actions so how does
[00:24:34] scores over all the actions so how does that make sense given that you know
[00:24:37] that make sense given that you know state s you want to know the score of
[00:24:40] state s you want to know the score of going of taking action a in this state
[00:24:42] going of taking action a in this state the score should be the reward that you
[00:24:44] the score should be the reward that you get by going there plus the discount
[00:24:47] get by going there plus the discount times the maximum you can get in the
[00:24:49] times the maximum you can get in the future that's actually what we used in
[00:24:51] future that's actually what we used in the iteration does this bellman equation
[00:24:53] the iteration does this bellman equation make sense okay so remember this is
[00:24:58] make sense okay so remember this is going to be very important in to
[00:25:00] going to be very important in to learning this bellman equation
[00:25:01] learning this bellman equation it's the equation that is satisfied by
[00:25:03] it's the equation that is satisfied by the optimal Q table or Q function and if
[00:25:07] the optimal Q table or Q function and if you try out all these entries you will
[00:25:08] you try out all these entries you will see that it follows this equation
[00:25:10] see that it follows this equation so when she didn't is not optimal it's
[00:25:14] so when she didn't is not optimal it's not following this equation yet we would
[00:25:17] not following this equation yet we would like you to follow this equation another
[00:25:20] like you to follow this equation another point of vocabulary reinforcement
[00:25:21] point of vocabulary reinforcement learning is a policy policies denoted P
[00:25:24] learning is a policy policies denoted P sometimes or new and sorry pi PI of s is
[00:25:29] sometimes or new and sorry pi PI of s is equal to arc max over the actions of the
[00:25:32] equal to arc max over the actions of the optimal Q not sure what it means it
[00:25:35] optimal Q not sure what it means it means it's exactly our decision process
[00:25:37] means it's exactly our decision process it's even that we're in state s we look
[00:25:39] it's even that we're in state s we look at all the columns of this state s in
[00:25:42] at all the columns of this state s in our Q table we take the maximum and this
[00:25:44] our Q table we take the maximum and this is what PI of S is telling us it's
[00:25:46] is what PI of S is telling us it's telling us this is the action you should
[00:25:47] telling us this is the action you should take so PI our policy is our
[00:25:49] take so PI our policy is our decision-making okay it tells us what's
[00:25:54] decision-making okay it tells us what's the best strategy to follow in a given
[00:25:56] the best strategy to follow in a given state any questions so far
[00:26:05] ok and so I have a question for you
[00:26:10] why is deep earning helpful yes that's
[00:26:19] why is deep earning helpful yes that's very Z number of states is way too large
[00:26:22] very Z number of states is way too large to store a table like that like if you
[00:26:25] to store a table like that like if you have a small number of states and number
[00:26:27] have a small number of states and number of actions then easy you can use a few
[00:26:30] of actions then easy you can use a few table you can add every state look into
[00:26:32] table you can add every state look into the key table it's super quick and find
[00:26:34] the key table it's super quick and find out what you should do but ultimately
[00:26:36] out what you should do but ultimately this Q table will get bigger and bigger
[00:26:39] this Q table will get bigger and bigger depending on the application right and
[00:26:42] depending on the application right and the number of states for go is 10 to the
[00:26:45] the number of states for go is 10 to the power 170 approximately which means that
[00:26:49] power 170 approximately which means that this matrix should have a number of rows
[00:26:52] this matrix should have a number of rows equal to 10 with 170 zeros after you
[00:26:57] equal to 10 with 170 zeros after you know what I mean it's very big and
[00:26:59] know what I mean it's very big and number of actions is also going to be
[00:27:02] number of actions is also going to be bigger and go you can place your action
[00:27:05] bigger and go you can place your action everywhere on the board that is
[00:27:06] everywhere on the board that is available of course okay so many way to
[00:27:10] available of course okay so many way to many states and actions so we would need
[00:27:13] many states and actions so we would need to come up with maybe a function
[00:27:15] to come up with maybe a function approximator
[00:27:16] approximator that can give us the action based on the
[00:27:19] that can give us the action based on the state instead of having to store this
[00:27:21] state instead of having to store this matrix that's where deep learning will
[00:27:23] matrix that's where deep learning will come
[00:27:24] come so just to recap this first 30 minutes
[00:27:27] so just to recap this first 30 minutes in terms of vocabulary we learn what an
[00:27:29] in terms of vocabulary we learn what an environment is it's the it's the general
[00:27:32] environment is it's the it's the general game definition an agent is the thing
[00:27:36] game definition an agent is the thing we're trying to train the decision-maker
[00:27:38] we're trying to train the decision-maker a state an action reward Total Return a
[00:27:42] a state an action reward Total Return a discount factor the cue table which is
[00:27:44] discount factor the cue table which is the matrix of entries representing how
[00:27:46] the matrix of entries representing how good it is to take action a in state s a
[00:27:49] good it is to take action a in state s a policy which is our decision making
[00:27:51] policy which is our decision making function telling us what's the best
[00:27:53] function telling us what's the best strategy to apply in a state and Batman
[00:27:55] strategy to apply in a state and Batman equation which is satisfied by the
[00:27:57] equation which is satisfied by the optimal cue table now we will tweak this
[00:28:00] optimal cue table now we will tweak this cue table into a cue function and that's
[00:28:03] cue table into a cue function and that's where we shift from cue learning to deep
[00:28:06] where we shift from cue learning to deep cue learning so find a cue function to
[00:28:09] cue learning so find a cue function to replace the few table ok so this is the
[00:28:13] replace the few table ok so this is the setting we have our problem statement we
[00:28:15] setting we have our problem statement we have our cue table we want to change it
[00:28:17] have our cue table we want to change it into a function approximator that will
[00:28:20] into a function approximator that will be our neural network does that make
[00:28:24] be our neural network does that make sense how deep learning comes into
[00:28:26] sense how deep learning comes into reinforcement learning here so now we
[00:28:29] reinforcement learning here so now we take a state as input for word
[00:28:31] take a state as input for word propagated in the deep network and get
[00:28:34] propagated in the deep network and get an output which is an action an action
[00:28:38] an output which is an action an action score for all the actions it makes sense
[00:28:42] score for all the actions it makes sense to have an output layer that is the size
[00:28:45] to have an output layer that is the size of the number of actions because we
[00:28:48] of the number of actions because we don't want to we don't want to give an
[00:28:50] don't want to we don't want to give an action as input and the state as input
[00:28:52] action as input and the state as input and get the score for this action taken
[00:28:55] and get the score for this action taken in this state instead we can be much
[00:28:57] in this state instead we can be much quicker you can just give the state as
[00:28:59] quicker you can just give the state as input get all the distribution of scores
[00:29:02] input get all the distribution of scores over the output and we just select the
[00:29:05] over the output and we just select the maximum of this vector which will tell
[00:29:07] maximum of this vector which will tell us which action is best so if I wait for
[00:29:10] us which action is best so if I wait for in states to let's say here we're in
[00:29:15] in states to let's say here we're in state two and before propagate state 2
[00:29:17] state two and before propagate state 2 we get two values which are the scores
[00:29:20] we get two values which are the scores of going left and right from state two
[00:29:23] of going left and right from state two we can select the maximum of those and
[00:29:25] we can select the maximum of those and it will give us our action
[00:29:27] it will give us our action the question is how to train this
[00:29:30] the question is how to train this network we know how to train it we've
[00:29:33] network we know how to train it we've been learning it for nine weeks compute
[00:29:35] been learning it for nine weeks compute the loss back propagate can you guys
[00:29:38] the loss back propagate can you guys think of some issues that that make this
[00:29:41] think of some issues that that make this setting different from a classic
[00:29:43] setting different from a classic supervised learning setting the reward
[00:29:52] supervised learning setting the reward changes dynamically so the reward
[00:29:54] changes dynamically so the reward doesn't change the reward is set you
[00:29:56] doesn't change the reward is set you define it at the beginning it doesn't
[00:29:57] define it at the beginning it doesn't change that m'kay but I think what you
[00:29:58] change that m'kay but I think what you meant is that the Q score is changed
[00:30:00] meant is that the Q score is changed dynamically that's true the Q scores
[00:30:03] dynamically that's true the Q scores change dynamically but that's that's
[00:30:06] change dynamically but that's that's probably okay because our network
[00:30:08] probably okay because our network changed that our network is now the Q
[00:30:09] changed that our network is now the Q score so when we update the parameters
[00:30:11] score so when we update the parameters of the network it updates the Q scores
[00:30:13] of the network it updates the Q scores what's what's another issue that we
[00:30:16] what's what's another issue that we might have No Labels
[00:30:21] might have No Labels remember in supervised learning you need
[00:30:23] remember in supervised learning you need labels to train your network what are
[00:30:25] labels to train your network what are the labels here and don't say compute
[00:30:34] the labels here and don't say compute the Q table use them as labels it's not
[00:30:37] the Q table use them as labels it's not gonna work okay
[00:30:41] so that's the main issue that makes this
[00:30:43] so that's the main issue that makes this problem very different from classic
[00:30:44] problem very different from classic supervised learning so let's see how how
[00:30:47] supervised learning so let's see how how deep burning can be tweaked a little and
[00:30:49] deep burning can be tweaked a little and we want you to see these techniques
[00:30:51] we want you to see these techniques because they're helpful when you read a
[00:30:53] because they're helpful when you read a variety of research papers we have our
[00:30:56] variety of research papers we have our network given a state gives us two
[00:30:59] network given a state gives us two scores that represent actions for going
[00:31:01] scores that represent actions for going left and right from these states the
[00:31:02] left and right from these states the last function that will define is it a
[00:31:05] last function that will define is it a classification problem or a regression
[00:31:06] classification problem or a regression problem regression problem because the Q
[00:31:14] problem regression problem because the Q score doesn't have to be a probably B to
[00:31:17] score doesn't have to be a probably B to the 0 &amp; 1 it's just a score that you
[00:31:19] the 0 &amp; 1 it's just a score that you want to give and that should look that
[00:31:21] want to give and that should look that should mimic the long term discounted
[00:31:23] should mimic the long term discounted reward so in fact the last function we
[00:31:25] reward so in fact the last function we can use is is the L to last function y
[00:31:30] can use is is the L to last function y minus the Q score squared so let's say
[00:31:33] minus the Q score squared so let's say we do it for the Q going to the right
[00:31:37] we do it for the Q going to the right the question is what is why what is the
[00:31:40] the question is what is why what is the target for this queue and remember what
[00:31:43] target for this queue and remember what I copied on the top of this slide is the
[00:31:45] I copied on the top of this slide is the bellman equation we know that the
[00:31:47] bellman equation we know that the optimal queue should follow this
[00:31:49] optimal queue should follow this equation we know it the problem is that
[00:31:54] equation we know it the problem is that this equation depends on its own Q you
[00:31:57] this equation depends on its own Q you know like you have to on both sides of
[00:31:58] know like you have to on both sides of the equation it means if you set the
[00:32:00] the equation it means if you set the label to be R plus gamma times max of Q
[00:32:04] label to be R plus gamma times max of Q stars then when you will back propagate
[00:32:07] stars then when you will back propagate you will also have a derivative here let
[00:32:10] you will also have a derivative here let me let me go into the details let's
[00:32:13] me let me go into the details let's define the target value let's assume
[00:32:14] define the target value let's assume that going left is better than going
[00:32:18] that going left is better than going right at this point in time so we
[00:32:20] right at this point in time so we initialize the network randomly we
[00:32:22] initialize the network randomly we forward propagate state to in the
[00:32:23] forward propagate state to in the network and the Q score for left is more
[00:32:26] network and the Q score for left is more than the Q score for right so that's the
[00:32:29] than the Q score for right so that's the action we will take at this point is
[00:32:30] action we will take at this point is going left let's define our target Y as
[00:32:35] going left let's define our target Y as the reward you get when you go left
[00:32:38] the reward you get when you go left immediate plus gamma times the maximum
[00:32:42] immediate plus gamma times the maximum of all the Q values you get from the
[00:32:48] of all the Q values you get from the next step so let me spend a little more
[00:32:53] next step so let me spend a little more time on that because it's a little
[00:32:54] time on that because it's a little complicated
[00:32:54] complicated I mean s I move to s next using a move
[00:32:59] I mean s I move to s next using a move on the left I get immediate reward R and
[00:33:02] on the left I get immediate reward R and I also get a new state s prime s next I
[00:33:06] I also get a new state s prime s next I can for propagate this state in the
[00:33:09] can for propagate this state in the network and you understand what is the
[00:33:11] network and you understand what is the maximum I can get from this state take
[00:33:13] maximum I can get from this state take the maximum value and plug it in here so
[00:33:18] the maximum value and plug it in here so this is hopefully what's the optimal Q
[00:33:22] this is hopefully what's the optimal Q should follow it's a proxy to a good
[00:33:25] should follow it's a proxy to a good label it means we know that the bellman
[00:33:28] label it means we know that the bellman equation tells us the best q satisfies
[00:33:31] equation tells us the best q satisfies this equation when in fact this equation
[00:33:34] this equation when in fact this equation is not true yet because the true
[00:33:36] is not true yet because the true equation will have Q star here not q q
[00:33:40] equation will have Q star here not q q star which is the optimal q what we hope
[00:33:42] star which is the optimal q what we hope is that if we use this proxy as our
[00:33:45] is that if we use this proxy as our label and we learn the difference
[00:33:47] label and we learn the difference between where we are now and this proxy
[00:33:50] between where we are now and this proxy we can then update the proxy get closer
[00:33:52] we can then update the proxy get closer to the optimality train again update the
[00:33:56] to the optimality train again update the proxy get closer to optimality train
[00:33:58] proxy get closer to optimality train again and so on our only hope is that
[00:34:00] again and so on our only hope is that this will converge so does it make sense
[00:34:04] this will converge so does it make sense how this is different from the burning
[00:34:05] how this is different from the burning the labels are moving they're not static
[00:34:09] the labels are moving they're not static labels we define a label to be a best
[00:34:14] labels we define a label to be a best guess of what would be the best few
[00:34:16] guess of what would be the best few function we have then we compute the
[00:34:19] function we have then we compute the loss of where the Q function is right
[00:34:20] loss of where the Q function is right now compared to this we back propagate
[00:34:23] now compared to this we back propagate so that the Q function gets closer to
[00:34:25] so that the Q function gets closer to our best guess then now that we have a
[00:34:27] our best guess then now that we have a better q function we can have a better
[00:34:29] better q function we can have a better guess so we make a better guess and we
[00:34:33] guess so we make a better guess and we fix this guess and now we compute the
[00:34:35] fix this guess and now we compute the difference between this Q function that
[00:34:37] difference between this Q function that we have and our best guess we back
[00:34:39] we have and our best guess we back propagate up we get to our best guess we
[00:34:43] propagate up we get to our best guess we can update our best guess again and we
[00:34:45] can update our best guess again and we hope that doing that iteratively will
[00:34:47] hope that doing that iteratively will end with the convergence and a q
[00:34:50] end with the convergence and a q function that will be very close to
[00:34:52] function that will be very close to satisfy the bellman equation the optimal
[00:34:54] satisfy the bellman equation the optimal penguin equation does it make sense this
[00:34:57] penguin equation does it make sense this is the most complicated part of Q
[00:34:59] is the most complicated part of Q learning yeah we generate the output of
[00:35:07] learning yeah we generate the output of the network we get two Q function we
[00:35:10] the network we get two Q function we compare it to the q the best q function
[00:35:12] compare it to the q the best q function that we think is the one that satisfies
[00:35:17] that we think is the one that satisfies the bellman equation we don't but we
[00:35:22] the bellman equation we don't but we guess it based on the Q we have so
[00:35:26] guess it based on the Q we have so basically when you have Q you can
[00:35:27] basically when you have Q you can compute this Batman equation and it will
[00:35:29] compute this Batman equation and it will give you some values these values are
[00:35:32] give you some values these values are probably closer to where you want to get
[00:35:34] probably closer to where you want to get to where from where you are now where
[00:35:35] to where from where you are now where your now is is further from this
[00:35:37] your now is is further from this optimality and you want to reduce this
[00:35:39] optimality and you want to reduce this gap by by like to close the gap you back
[00:35:42] gap by by like to close the gap you back propagate yes so the question is is
[00:35:49] propagate yes so the question is is there possibility for this to diverge so
[00:35:51] there possibility for this to diverge so this is a broader discussion that would
[00:35:53] this is a broader discussion that would take a full lecture to prove so I put a
[00:35:55] take a full lecture to prove so I put a paper here from for Francisco Melo which
[00:35:58] paper here from for Francisco Melo which proves the convergence of this algorithm
[00:36:00] proves the convergence of this algorithm so it converges and in fact
[00:36:03] so it converges and in fact it converges because we're using a lot
[00:36:05] it converges because we're using a lot of tips and tricks that we will see
[00:36:06] of tips and tricks that we will see later but if you want to see the math
[00:36:08] later but if you want to see the math behind it and it's a it's a full lecture
[00:36:11] behind it and it's a it's a full lecture of proof I invite you to look at this
[00:36:14] of proof I invite you to look at this simple proof for convergence of
[00:36:16] simple proof for convergence of development equation ok ok so this is
[00:36:21] development equation ok ok so this is the case where a Left score is higher
[00:36:24] the case where a Left score is higher than right score and we have two terms
[00:36:26] than right score and we have two terms in our targets immediate reward for
[00:36:28] in our targets immediate reward for taking action left and also discounted
[00:36:30] taking action left and also discounted maximum future reward when you are in
[00:36:32] maximum future reward when you are in state it's as next ok the tricky part is
[00:36:41] state it's as next ok the tricky part is that let's say we we compute that we can
[00:36:44] that let's say we we compute that we can do it we have everything we have
[00:36:45] do it we have everything we have everything to compute our target we have
[00:36:48] everything to compute our target we have R which is defined by the by the human
[00:36:50] R which is defined by the by the human at the beginning and we can also get
[00:36:53] at the beginning and we can also get this number
[00:36:53] this number because we know that if we take action
[00:36:56] because we know that if we take action left we can then get s next and we for
[00:37:00] left we can then get s next and we for propagate ethnics in the network we take
[00:37:02] propagate ethnics in the network we take the maximum output and it's this so we
[00:37:04] the maximum output and it's this so we have everything in this in this equation
[00:37:06] have everything in this in this equation the problem now is if I plug this and my
[00:37:10] the problem now is if I plug this and my Q score in my loss function and I ask
[00:37:12] Q score in my loss function and I ask you to back propagate back propagation
[00:37:15] you to back propagate back propagation is what W equals W minus alpha times the
[00:37:18] is what W equals W minus alpha times the derivative of the last function with
[00:37:19] derivative of the last function with respect to W the parameters of the
[00:37:21] respect to W the parameters of the network which term will have a nonzero
[00:37:25] network which term will have a nonzero value obviously the second term Q of s
[00:37:28] value obviously the second term Q of s go to the left will have a nonzero value
[00:37:30] go to the left will have a nonzero value because it depends on the parameters of
[00:37:32] because it depends on the parameters of the network W but Y will also have a
[00:37:37] the network W but Y will also have a nonzero value because you have Q here so
[00:37:42] nonzero value because you have Q here so how do you handle that you actually get
[00:37:44] how do you handle that you actually get a feedback loop in this back propagation
[00:37:47] a feedback loop in this back propagation that makes the network unstable what we
[00:37:51] that makes the network unstable what we do is that we consider this fixed we
[00:37:54] do is that we consider this fixed we will consider this Q fixed the Q that is
[00:37:56] will consider this Q fixed the Q that is our target is going to be fixed for many
[00:37:58] our target is going to be fixed for many iteration let's say a million or a
[00:38:01] iteration let's say a million or a hundred thousand iteration until we get
[00:38:03] hundred thousand iteration until we get close to there and our gradient is small
[00:38:05] close to there and our gradient is small then we will update it and we'll fix it
[00:38:08] then we will update it and we'll fix it so we actually have two networks in
[00:38:09] so we actually have two networks in parallel one that is fixed and one that
[00:38:11] parallel one that is fixed and one that is not fixed
[00:38:13] is not fixed okay and the second case is similar if
[00:38:16] okay and the second case is similar if the cue score to go on the right was
[00:38:18] the cue score to go on the right was more than the Q score to go on the left
[00:38:20] more than the Q score to go on the left we would define our target as immediate
[00:38:22] we would define our target as immediate reward of going to the right plus gamma
[00:38:25] reward of going to the right plus gamma times the maximum Q score we get if
[00:38:28] times the maximum Q score we get if we're in the states that we in the next
[00:38:31] we're in the states that we in the next state and take the best action does this
[00:38:35] state and take the best action does this make sense is the most complicated part
[00:38:37] make sense is the most complicated part of Q learning
[00:38:38] of Q learning this is the hard part to understand so
[00:38:40] this is the hard part to understand so immediate reward to go to the right and
[00:38:42] immediate reward to go to the right and discounted maximum feature reward when
[00:38:44] discounted maximum feature reward when you're in state s next going to draw it
[00:38:46] you're in state s next going to draw it so this is hold fix for backdrop so no
[00:38:52] so this is hold fix for backdrop so no derivative if we do that then no problem
[00:38:54] derivative if we do that then no problem Y is just a number
[00:38:56] Y is just a number we come back to our original supervised
[00:38:58] we come back to our original supervised learning setting y is the number and we
[00:39:00] learning setting y is the number and we compute the loss and we back propagate
[00:39:02] compute the loss and we back propagate no difference okay
[00:39:06] no difference okay so compute DL over DW and update W using
[00:39:10] so compute DL over DW and update W using stochastic gradient descent method
[00:39:12] stochastic gradient descent method rmsprop Adam whatever you guys want so
[00:39:18] rmsprop Adam whatever you guys want so let's go over this this full dqn deep Q
[00:39:22] let's go over this this full dqn deep Q network implementation and this slide is
[00:39:25] network implementation and this slide is a pseudocode to help you understand how
[00:39:28] a pseudocode to help you understand how this entire algorithm work we will
[00:39:30] this entire algorithm work we will actually plug in many methods in this in
[00:39:32] actually plug in many methods in this in this pseudocode so please focus right
[00:39:34] this pseudocode so please focus right now and if you understand this you
[00:39:36] now and if you understand this you understand the entire rest of the
[00:39:37] understand the entire rest of the lecture we initialize our two network
[00:39:40] lecture we initialize our two network parameters just as we initialize the
[00:39:42] parameters just as we initialize the network in deep learning we loop over
[00:39:44] network in deep learning we loop over episode so let's define an episode to be
[00:39:46] episode so let's define an episode to be one game like going from start to end to
[00:39:49] one game like going from start to end to a terminal state this is one episode we
[00:39:52] a terminal state this is one episode we can also define episodes sometimes to be
[00:39:54] can also define episodes sometimes to be many states like breakout which is the
[00:39:57] many states like breakout which is the game with the paddle usually is 20
[00:40:00] game with the paddle usually is 20 points the first player to get 20 points
[00:40:02] points the first player to get 20 points finishes the game so episode will be 20
[00:40:04] finishes the game so episode will be 20 points once you're looking over episode
[00:40:09] points once you're looking over episode starts from an initial state s in our
[00:40:11] starts from an initial state s in our case it's only one initial state which
[00:40:14] case it's only one initial state which is state two and loop over time steps
[00:40:17] is state two and loop over time steps for propagate S state two in the Q
[00:40:20] for propagate S state two in the Q network
[00:40:22] network execute action a which has the maximum Q
[00:40:24] execute action a which has the maximum Q score observe a immediate reward R and
[00:40:29] score observe a immediate reward R and the next step s prime compute target Y
[00:40:34] the next step s prime compute target Y and to compute Y we know that we need to
[00:40:36] and to compute Y we know that we need to take s prime for propagated in the
[00:40:39] take s prime for propagated in the network again and then compute the last
[00:40:42] network again and then compute the last function update the parameters will
[00:40:44] function update the parameters will gradually set does this loop make sense
[00:40:47] gradually set does this loop make sense it's very close to what we do in general
[00:40:49] it's very close to what we do in general the only difference would be this part
[00:40:52] the only difference would be this part like we compute target Y using a double
[00:40:55] like we compute target Y using a double for propagation so we for a propagation
[00:40:57] for propagation so we for a propagation before propagate two times in each loop
[00:41:01] do you guys have any questions on on
[00:41:04] do you guys have any questions on on this pseudocode
[00:41:12] okay so we will now see a concrete
[00:41:17] okay so we will now see a concrete application of a Dipsy network so this
[00:41:20] application of a Dipsy network so this was the theoretical partner we're going
[00:41:21] was the theoretical partner we're going to the practical part which is going be
[00:41:23] to the practical part which is going be to be more fun so let's look at this
[00:41:25] to be more fun so let's look at this game it's called breakout the goal when
[00:41:29] game it's called breakout the goal when you play breakout is to destroy all the
[00:41:31] you play breakout is to destroy all the bricks without having the ball pass the
[00:41:34] bricks without having the ball pass the line on the bottle so we have a paddle
[00:41:37] line on the bottle so we have a paddle and our decisions can be idle stay stay
[00:41:40] and our decisions can be idle stay stay where you are move the paddle to the
[00:41:42] where you are move the paddle to the right or move the paddle to the left
[00:41:43] right or move the paddle to the left right and this demo and you have the
[00:41:53] right and this demo and you have the credits on the bottom of the slide shows
[00:41:56] credits on the bottom of the slide shows that after training breakout using Q
[00:42:00] that after training breakout using Q learning they get a super intelligent
[00:42:03] learning they get a super intelligent agents which figures out the trick to
[00:42:07] agents which figures out the trick to finish the game very quickly so actually
[00:42:09] finish the game very quickly so actually even like good players you don't know
[00:42:13] even like good players you don't know this trick professional players no
[00:42:15] this trick professional players no district but in breakout you can
[00:42:18] district but in breakout you can actually try to dig a tunnel to get on
[00:42:20] actually try to dig a tunnel to get on the other side of the bricks and then
[00:42:22] the other side of the bricks and then you will destroy all the bricks super
[00:42:23] you will destroy all the bricks super quickly from top to bottom instead of
[00:42:26] quickly from top to bottom instead of bottom up what's super interesting is
[00:42:28] bottom up what's super interesting is that the network figured out this on its
[00:42:30] that the network figured out this on its own without human supervision and this
[00:42:34] own without human supervision and this is the kind of thing we want to remember
[00:42:35] is the kind of thing we want to remember if we were to use input the go board and
[00:42:39] if we were to use input the go board and output professional players we will not
[00:42:41] output professional players we will not figure out that type of stuff most of
[00:42:44] figure out that type of stuff most of the time so my question is what's the
[00:42:48] the time so my question is what's the input of the Q network in this setting
[00:42:50] input of the Q network in this setting our goal is to destroy all the bricks so
[00:42:52] our goal is to destroy all the bricks so play break out what should be the input
[00:43:10] try something that position position of
[00:43:18] try something that position position of bricks position of the paddle function
[00:43:21] bricks position of the paddle function of the bricks what else ball position
[00:43:23] of the bricks what else ball position okay yeah I agree
[00:43:24] okay yeah I agree so this is what we would call a future
[00:43:27] so this is what we would call a future representation it means when you're in
[00:43:30] representation it means when you're in an environment you can extract some
[00:43:31] an environment you can extract some features right and these are examples of
[00:43:34] features right and these are examples of features give me the position of the
[00:43:35] features give me the position of the ball is one feature give me the position
[00:43:37] ball is one feature give me the position of the bricks
[00:43:38] of the bricks another feature give me the position of
[00:43:39] another feature give me the position of the paddle another feature which are
[00:43:41] the paddle another feature which are good features for this game but if you
[00:43:44] good features for this game but if you want to get the entire information you'd
[00:43:45] want to get the entire information you'd better do something else yeah the pixels
[00:43:56] better do something else yeah the pixels you don't want any human supervision you
[00:43:58] you don't want any human supervision you don't want to put features you just okay
[00:44:00] don't want to put features you just okay take the pixels play the game you can
[00:44:03] take the pixels play the game you can control the paddle take the pixel so
[00:44:06] control the paddle take the pixel so yeah this is a good input to the cue
[00:44:07] yeah this is a good input to the cue Network what's the output I said it
[00:44:10] Network what's the output I said it earlier probably the output of the
[00:44:12] earlier probably the output of the network will be three key values
[00:44:14] network will be three key values representing the action going left going
[00:44:18] representing the action going left going right and staying idle in a specific
[00:44:20] right and staying idle in a specific state that is the input of the network
[00:44:21] state that is the input of the network so give it a pixel image we want to
[00:44:25] so give it a pixel image we want to predict choose scores for the three
[00:44:28] predict choose scores for the three possible actions now what's the issue
[00:44:31] possible actions now what's the issue with that you think that would work or
[00:44:33] with that you think that would work or not
[00:44:41] can someone think of something going
[00:44:43] can someone think of something going wrong here looking at the inputs
[00:45:00] okay I'm gonna help you
[00:45:03] okay I'm gonna help you if I give ya you won't try oh yeah good
[00:45:09] if I give ya you won't try oh yeah good point based on this image you cannot
[00:45:11] point based on this image you cannot know if the ball is going up or down so
[00:45:14] know if the ball is going up or down so actually it's super hard because the
[00:45:15] actually it's super hard because the action you take highly depends on if the
[00:45:17] action you take highly depends on if the goal is going up or down right
[00:45:19] goal is going up or down right if the ball is going down and even if
[00:45:23] if the ball is going down and even if the ball is going down you don't even
[00:45:24] the ball is going down you don't even know which direct direction is going
[00:45:26] know which direct direction is going down so there's a problem here
[00:45:28] down so there's a problem here definitely there is not enough
[00:45:29] definitely there is not enough information to make a decision on the
[00:45:31] information to make a decision on the action to take and if it's hard for us
[00:45:33] action to take and if it's hard for us it's going to be hard for the network so
[00:45:36] it's going to be hard for the network so what's a hack - to prevent that it's to
[00:45:40] what's a hack - to prevent that it's to take successive frames so instead of one
[00:45:43] take successive frames so instead of one frame we can take four frames successive
[00:45:46] frame we can take four frames successive frames and here the same setting as we
[00:45:48] frames and here the same setting as we had before but we see that the ball is
[00:45:50] had before but we see that the ball is going up we seed which direction is
[00:45:53] going up we seed which direction is going up and we know what action we
[00:45:55] going up and we know what action we should take because we know the slope of
[00:45:57] should take because we know the slope of the ball and also also if it's going up
[00:46:00] the ball and also also if it's going up or down that make sense okay so this is
[00:46:05] or down that make sense okay so this is called a pre-processing given a state
[00:46:07] called a pre-processing given a state computer function Phi of s that gives
[00:46:11] computer function Phi of s that gives you the history of this state which is
[00:46:14] you the history of this state which is the four sequence of four last frames
[00:46:16] the four sequence of four last frames what other pre-processing can we do and
[00:46:22] what other pre-processing can we do and this is something I want you to be quick
[00:46:24] this is something I want you to be quick like we we learnt it together in deep
[00:46:26] like we we learnt it together in deep learning input pre-processing remember
[00:46:32] learning input pre-processing remember the second lecture where the question
[00:46:35] the second lecture where the question was what resolution should we use
[00:46:38] was what resolution should we use remember you have a cat recon mission
[00:46:41] remember you have a cat recon mission what's resolution would you want to use
[00:46:44] what's resolution would you want to use here same thing if we can reduce the
[00:46:48] here same thing if we can reduce the size of the input let's do it if we
[00:46:51] size of the input let's do it if we don't need all that information let's do
[00:46:53] don't need all that information let's do it for example do you think the colors
[00:46:55] it for example do you think the colors are important very minor I don't think
[00:46:59] are important very minor I don't think they're important
[00:47:00] they're important so maybe we can grayscale everything
[00:47:02] so maybe we can grayscale everything that removes three chat that converse
[00:47:05] that removes three chat that converse three channels into one channel which is
[00:47:07] three channels into one channel which is amazing in terms of computation what
[00:47:10] amazing in terms of computation what else I think we can crop a lot of
[00:47:12] else I think we can crop a lot of this like maybe there's a line here we
[00:47:15] this like maybe there's a line here we don't need to make any decision we don't
[00:47:18] don't need to make any decision we don't need this course
[00:47:19] need this course maybe so actually there are some games
[00:47:22] maybe so actually there are some games where the score is important for a
[00:47:23] where the score is important for a decision making an example is football
[00:47:26] decision making an example is football like or soccer when you're when you're
[00:47:29] like or soccer when you're when you're winning 1-0 you you'd better if you're
[00:47:32] winning 1-0 you you'd better if you're playing against the strong team defend
[00:47:34] playing against the strong team defend like get back and defend to keep this
[00:47:36] like get back and defend to keep this one zero so the score is actually
[00:47:38] one zero so the score is actually important in the decision-making process
[00:47:39] important in the decision-making process and in fact their famous coach in
[00:47:43] and in fact their famous coach in football which have this technique
[00:47:46] football which have this technique called park the bus where you just put
[00:47:49] called park the bus where you just put all your team in front of the goal once
[00:47:50] all your team in front of the goal once you have scored a goal so this is an
[00:47:52] you have scored a goal so this is an example so here there is no park the bus
[00:47:55] example so here there is no park the bus but we can definitely get rid of the
[00:47:58] but we can definitely get rid of the score which remove some pixels and
[00:48:00] score which remove some pixels and reduces the number of computations and
[00:48:05] reduces the number of computations and we can reduce to grayscale one important
[00:48:07] we can reduce to grayscale one important thing to be careful about when you
[00:48:09] thing to be careful about when you reduce your grayscale is that grayscale
[00:48:11] reduce your grayscale is that grayscale is a dimensionality reduction technique
[00:48:13] is a dimensionality reduction technique it means you you lose information but
[00:48:15] it means you you lose information but you know if you have three channels and
[00:48:17] you know if you have three channels and you reduce everything in one channel
[00:48:19] you reduce everything in one channel sometimes you would have different color
[00:48:21] sometimes you would have different color pixels which will end up with the same
[00:48:22] pixels which will end up with the same grayscale value depending on the grade
[00:48:24] grayscale value depending on the grade scale that we use and it's been seen
[00:48:26] scale that we use and it's been seen that you lose some information sometimes
[00:48:28] that you lose some information sometimes so let's say the ball and some bricks
[00:48:31] so let's say the ball and some bricks have the same grayscale value then you
[00:48:34] have the same grayscale value then you would not differentiate them or let's
[00:48:37] would not differentiate them or let's say the paddle and the background have
[00:48:38] say the paddle and the background have the same grayscale value then you would
[00:48:40] the same grayscale value then you would not differentiate them so you have to be
[00:48:41] not differentiate them so you have to be careful of that type of stuff and
[00:48:43] careful of that type of stuff and there's other methods that do greyscale
[00:48:45] there's other methods that do greyscale in not other ways like luminance so we
[00:48:48] in not other ways like luminance so we have our Phi of s which is this which is
[00:48:52] have our Phi of s which is this which is this input to the key network and the
[00:48:55] this input to the key network and the dip to network architecture is going to
[00:48:56] dip to network architecture is going to be a convolutional neural network
[00:48:58] be a convolutional neural network because we're working with images so we
[00:49:00] because we're working with images so we for propagate dots this is the
[00:49:01] for propagate dots this is the architecture from min cavusoglu
[00:49:04] architecture from min cavusoglu silver at all from the pine cone value
[00:49:07] silver at all from the pine cone value can't really control uu to fully
[00:49:09] can't really control uu to fully connected layers and you get your Q
[00:49:11] connected layers and you get your Q scores and we get back to our training
[00:49:17] scores and we get back to our training loop so what do we need to change in our
[00:49:20] loop so what do we need to change in our training loop here is we said that one
[00:49:23] training loop here is we said that one frame is not enough so we pre process
[00:49:24] frame is not enough so we pre process all the frames so
[00:49:26] all the frames so initial States is converted to 5s the
[00:49:29] initial States is converted to 5s the fault propagated state is 5s and so on
[00:49:32] fault propagated state is 5s and so on so everywhere we had s or s Prime we
[00:49:35] so everywhere we had s or s Prime we convert to Phi of s or Phi of s prime
[00:49:37] convert to Phi of s or Phi of s prime which gives us the history now there are
[00:49:40] which gives us the history now there are a lot more techniques that we can plug
[00:49:42] a lot more techniques that we can plug in here and we will see three more one
[00:49:44] in here and we will see three more one is keeping track of the terminal state
[00:49:46] is keeping track of the terminal state in this loop we should keep track of the
[00:49:48] in this loop we should keep track of the terminal state because we said if we
[00:49:49] terminal state because we said if we reach a terminal state we want to end
[00:49:51] reach a terminal state we want to end the loop break the loop another reason
[00:49:53] the loop break the loop another reason is because the Y function so basically
[00:49:57] is because the Y function so basically we have to add create a boolean to
[00:49:59] we have to add create a boolean to detect the terminal States before
[00:50:01] detect the terminal States before looping through the time steps and
[00:50:02] looping through the time steps and inside the loop we want to check if the
[00:50:06] inside the loop we want to check if the new s Prime we're going to is a terminal
[00:50:09] new s Prime we're going to is a terminal state if it's a terminal state then I
[00:50:11] state if it's a terminal state then I can stop this loop and go back play
[00:50:14] can stop this loop and go back play another episode so play another start at
[00:50:17] another episode so play another start at another starting state and continue my
[00:50:19] another starting state and continue my game now this Y target that we compute
[00:50:24] game now this Y target that we compute is different if we're in a terminal
[00:50:26] is different if we're in a terminal state or not because if we're a terminal
[00:50:30] state or not because if we're a terminal state there is no reason to have a
[00:50:32] state there is no reason to have a discounted long-term reward there's
[00:50:34] discounted long-term reward there's nothing behind that terminal state so if
[00:50:36] nothing behind that terminal state so if we're in terminal state we just set it
[00:50:37] we're in terminal state we just set it to the immediate reward and we break if
[00:50:40] to the immediate reward and we break if we're not in a terminal state then we
[00:50:41] we're not in a terminal state then we would add this discounted future reward
[00:50:45] would add this discounted future reward any questions on that yep another issue
[00:50:54] any questions on that yep another issue that we're seeing this and which makes
[00:50:56] that we're seeing this and which makes this reinforcement learning setting
[00:50:58] this reinforcement learning setting super different from the classic
[00:51:00] super different from the classic supervised learning setting is that we
[00:51:02] supervised learning setting is that we only train on what we explore it means
[00:51:06] only train on what we explore it means I'm starting in a state s I compute I
[00:51:10] I'm starting in a state s I compute I forward propagate this Phi of s in my
[00:51:12] forward propagate this Phi of s in my network I get my vector of Q values I
[00:51:16] network I get my vector of Q values I select the best Q value the largest I
[00:51:19] select the best Q value the largest I get a new state because I can move now
[00:51:22] get a new state because I can move now from state s to s prime so I have a
[00:51:24] from state s to s prime so I have a transition from s take action a get s
[00:51:27] transition from s take action a get s prime or Phi of s take action a get Phi
[00:51:30] prime or Phi of s take action a get Phi of s prime now this is what I will use
[00:51:35] of s prime now this is what I will use to train my network I can forward
[00:51:37] to train my network I can forward propagate Phi of s prime again
[00:51:40] propagate Phi of s prime again the network and get my why targets
[00:51:43] the network and get my why targets compare my why to my queue and then back
[00:51:47] compare my why to my queue and then back propagate the issue is I may never
[00:51:49] propagate the issue is I may never explore this state transition again
[00:51:51] explore this state transition again maybe I will never get there anymore
[00:51:54] maybe I will never get there anymore it's super different from what we do in
[00:51:56] it's super different from what we do in supervised learning where you have a
[00:51:57] supervised learning where you have a data set and your data set can be used
[00:52:00] data set and your data set can be used many times with batch gradient descent
[00:52:02] many times with batch gradient descent or with any gradient descent algorithm
[00:52:04] or with any gradient descent algorithm one epoch you see all the data points so
[00:52:08] one epoch you see all the data points so if you do to epochs you see every day
[00:52:09] if you do to epochs you see every day two points two times if you do ten
[00:52:11] two points two times if you do ten epochs you see every day to prostrate
[00:52:12] epochs you see every day to prostrate three times ten times so it means that
[00:52:15] three times ten times so it means that every data point can be used several
[00:52:16] every data point can be used several time to train your algorithm in classic
[00:52:19] time to train your algorithm in classic deep learning that we've seen together
[00:52:20] deep learning that we've seen together in this case it doesn't seem possible
[00:52:22] in this case it doesn't seem possible because we only train when we explore
[00:52:25] because we only train when we explore and we might never get back there
[00:52:27] and we might never get back there especially because the training will be
[00:52:29] especially because the training will be influenced by where we go so maybe there
[00:52:31] influenced by where we go so maybe there are some places where we will never go
[00:52:33] are some places where we will never go because why we train and why we learn it
[00:52:35] because why we train and why we learn it will it will kind of direct our decision
[00:52:38] will it will kind of direct our decision process and we will never train on some
[00:52:39] process and we will never train on some parts of the game so this is why we have
[00:52:41] parts of the game so this is why we have other techniques to keep this training
[00:52:43] other techniques to keep this training stable one is called experience replay
[00:52:45] stable one is called experience replay so as I said here is what we are
[00:52:47] so as I said here is what we are currently doing we have Phi of s for
[00:52:50] currently doing we have Phi of s for propagates get a from taking action a we
[00:52:54] propagates get a from taking action a we observe an immediate reward R and a new
[00:52:57] observe an immediate reward R and a new state Phi of s Prime then from Phi of s
[00:52:59] state Phi of s Prime then from Phi of s Prime we can take a new action a prime
[00:53:02] Prime we can take a new action a prime observer a new reward R prime and the
[00:53:06] observer a new reward R prime and the new state Phi of s prime prime and so on
[00:53:10] new state Phi of s prime prime and so on and each of these is called a state
[00:53:13] and each of these is called a state transition and can be used to Train this
[00:53:17] transition and can be used to Train this is one experience leads to one iteration
[00:53:19] is one experience leads to one iteration of gradient descent a 1 e 2 e 3
[00:53:24] of gradient descent a 1 e 2 e 3 experience one experience to experience
[00:53:26] experience one experience to experience tree and the training will be trained on
[00:53:28] tree and the training will be trained on experience one then trained on
[00:53:30] experience one then trained on experience two then trained our
[00:53:31] experience two then trained our experience tree what we're doing with
[00:53:33] experience tree what we're doing with experience replay is the following we
[00:53:36] experience replay is the following we will observe experience one because we
[00:53:39] will observe experience one because we start in a site we take an action we see
[00:53:41] start in a site we take an action we see another state and earn a reward and this
[00:53:43] another state and earn a reward and this is called experience one we will create
[00:53:45] is called experience one we will create a replay memory you can think of it as a
[00:53:49] a replay memory you can think of it as a data structure in computer science and
[00:53:51] data structure in computer science and you will place this experience one topo
[00:53:53] you will place this experience one topo in the
[00:53:53] in the your play memory then from there we will
[00:53:56] your play memory then from there we will experience experience - we will put
[00:53:59] experience experience - we will put experience - in the replay memory same
[00:54:01] experience - in the replay memory same with experience 3 put it in a replay
[00:54:02] with experience 3 put it in a replay memory and so on
[00:54:04] memory and so on now during training what we will do is
[00:54:07] now during training what we will do is we will first train on experience 1
[00:54:09] we will first train on experience 1 because it's the only experience we have
[00:54:11] because it's the only experience we have so so far next step instead of training
[00:54:15] so so far next step instead of training on e 2 we will train on a sample from a
[00:54:17] on e 2 we will train on a sample from a 1 in we - it means we will take one out
[00:54:20] 1 in we - it means we will take one out of the replay memory and use this one
[00:54:21] of the replay memory and use this one for training but we will still continue
[00:54:25] for training but we will still continue to experiment something else and we will
[00:54:28] to experiment something else and we will sample from there and at every step the
[00:54:31] sample from there and at every step the replay memory will become bigger and
[00:54:32] replay memory will become bigger and bigger and while we train we will not
[00:54:35] bigger and while we train we will not necessarily train on the step we explore
[00:54:36] necessarily train on the step we explore we will train on a sample which is the
[00:54:39] we will train on a sample which is the replay memory + the new state way we
[00:54:41] replay memory + the new state way we explore why is it good is because e 1 as
[00:54:46] explore why is it good is because e 1 as you see can be useful many times in the
[00:54:48] you see can be useful many times in the training and maybe one was a critical
[00:54:50] training and maybe one was a critical state like it was a very important data
[00:54:52] state like it was a very important data point to learn or q function and so on
[00:54:56] point to learn or q function and so on and so on does the replay memory make
[00:54:58] and so on does the replay memory make sense so several advantages one is data
[00:55:02] sense so several advantages one is data efficiency we can use data many times
[00:55:05] efficiency we can use data many times don't have to use one day to appoint
[00:55:06] don't have to use one day to appoint only one time another very important
[00:55:10] only one time another very important advantage of experience replay is that
[00:55:13] advantage of experience replay is that if you don't use experience replay you
[00:55:16] if you don't use experience replay you have a lot of correlation between the
[00:55:18] have a lot of correlation between the successive data points so let's say the
[00:55:20] successive data points so let's say the ball is on the bottom right here and the
[00:55:23] ball is on the bottom right here and the ball is going to the top left for the
[00:55:26] ball is going to the top left for the next 10 data points the ball is always
[00:55:30] next 10 data points the ball is always going to go to the top left and it means
[00:55:33] going to go to the top left and it means the action you can take is always the
[00:55:37] the action you can take is always the same it actually doesn't matter a lot
[00:55:39] same it actually doesn't matter a lot because the ball is going up but most
[00:55:41] because the ball is going up but most likely you want to followed where the
[00:55:43] likely you want to followed where the ball is going so the action will be to
[00:55:45] ball is going so the action will be to go towards the ball for 10 actions in a
[00:55:48] go towards the ball for 10 actions in a row and then the ball will bounce on the
[00:55:51] row and then the ball will bounce on the wall and on the top and go back down
[00:55:53] wall and on the top and go back down here down to the bottom left the bottom
[00:55:56] here down to the bottom left the bottom right what will happen if your paddle is
[00:55:59] right what will happen if your paddle is here is that for 10 steps in a row you
[00:56:01] here is that for 10 steps in a row you will send your paddle on the right
[00:56:04] will send your paddle on the right remember what we said when which when we
[00:56:06] remember what we said when which when we asked the
[00:56:07] asked the question if you had to train a cat vs.
[00:56:09] question if you had to train a cat vs. dog classifier with batches of images of
[00:56:11] dog classifier with batches of images of cats batches of images of dog trained
[00:56:14] cats batches of images of dog trained first on the cats then trains on the
[00:56:15] first on the cats then trains on the dogs then trains on the cats then trains
[00:56:17] dogs then trains on the cats then trains on the dogs we will not converge because
[00:56:19] on the dogs we will not converge because your network will be super biased
[00:56:21] your network will be super biased towards predicting chat after seeing ten
[00:56:23] towards predicting chat after seeing ten images of cat super bias bit with
[00:56:26] images of cat super bias bit with predicting dogs when it sees ten images
[00:56:28] predicting dogs when it sees ten images of dog that's what's happening here
[00:56:30] of dog that's what's happening here so you want to deke or elate all these
[00:56:33] so you want to deke or elate all these experiences you want to be able to take
[00:56:35] experiences you want to be able to take one experience take another one that has
[00:56:36] one experience take another one that has nothing to do with it and so on this is
[00:56:39] nothing to do with it and so on this is what experience pure play goes and the
[00:56:41] what experience pure play goes and the third one is that the third one is that
[00:56:44] third one is that the third one is that you're basically trading computation and
[00:56:47] you're basically trading computation and memory against exploration exploration
[00:56:50] memory against exploration exploration is super costly the state space might be
[00:56:52] is super costly the state space might be super big but you know you have enough
[00:56:55] super big but you know you have enough computation probably you can have a lot
[00:56:56] computation probably you can have a lot of competition and you have memory space
[00:56:58] of competition and you have memory space let's use an experience replay okay
[00:57:02] let's use an experience replay okay so let's address experience replay to
[00:57:05] so let's address experience replay to our code here the transition resulting
[00:57:09] our code here the transition resulting from this part is added to the
[00:57:11] from this part is added to the experience to the replay memory D and
[00:57:13] experience to the replay memory D and will not necessarily be used in the
[00:57:15] will not necessarily be used in the iteration space so what's happening is
[00:57:16] iteration space so what's happening is before propagate Phi of s we get we
[00:57:20] before propagate Phi of s we get we observe a reward and an action and this
[00:57:24] observe a reward and an action and this action leads to a state Phi of s prime
[00:57:26] action leads to a state Phi of s prime this is an experience instead of
[00:57:29] this is an experience instead of training on this experience I'm just
[00:57:31] training on this experience I'm just going to take it put it in the replay
[00:57:33] going to take it put it in the replay memory add experience to replay memory
[00:57:35] memory add experience to replay memory and what I will train on is not this
[00:57:38] and what I will train on is not this experience is a sample random mini batch
[00:57:40] experience is a sample random mini batch of transition from the replay memory so
[00:57:43] of transition from the replay memory so you see we're exploring but we're not
[00:57:45] you see we're exploring but we're not training on what we explore we're
[00:57:47] training on what we explore we're training on the replay memory but the
[00:57:48] training on the replay memory but the replay memory is dynamic it changes and
[00:57:53] replay memory is dynamic it changes and update using the sample transitions so
[00:57:56] update using the sample transitions so the sample transition from the replay
[00:57:58] the sample transition from the replay memory will be used to do the update
[00:57:59] memory will be used to do the update that's the hack now another hack we want
[00:58:03] that's the hack now another hack we want the last hack we want to talk about is
[00:58:05] the last hack we want to talk about is exploration versus exploitation so as a
[00:58:08] exploration versus exploitation so as a human let's say you're commuting to
[00:58:10] human let's say you're commuting to Stanford every day and you know the road
[00:58:12] Stanford every day and you know the road you're commuting yet you know it you
[00:58:15] you're commuting yet you know it you always take the same road and your bias
[00:58:17] always take the same road and your bias towards taking this road why because
[00:58:20] towards taking this road why because the first time you took it it went well
[00:58:21] the first time you took it it went well and the more you take it the more you
[00:58:24] and the more you take it the more you learn about it not that it's good to
[00:58:26] learn about it not that it's good to know the tricks of how to drive fast but
[00:58:28] know the tricks of how to drive fast but but like you know the tricks you know
[00:58:30] but like you know the tricks you know that this this these slides is going to
[00:58:33] that this this these slides is going to be green at that moment and so on so you
[00:58:35] be green at that moment and so on so you you build a very good expertise in this
[00:58:38] you build a very good expertise in this road super expert but maybe there's
[00:58:42] road super expert but maybe there's another road that you don't want to try
[00:58:43] another road that you don't want to try that is better
[00:58:44] that is better you just don't try it because you're
[00:58:47] you just don't try it because you're focused on that road you're doing
[00:58:48] focused on that road you're doing exploitation you exploit what you
[00:58:50] exploitation you exploit what you already know
[00:58:51] already know exploration would be ok let's do it I'm
[00:58:54] exploration would be ok let's do it I'm gonna try another road today I might get
[00:58:56] gonna try another road today I might get late to the course but maybe I will have
[00:58:58] late to the course but maybe I will have a good discovery and I will like this
[00:58:59] a good discovery and I will like this road and I will take it later on there's
[00:59:01] road and I will take it later on there's a trade-off between these two because
[00:59:03] a trade-off between these two because the RL algorithm is going to figure out
[00:59:05] the RL algorithm is going to figure out some strategies that are super good and
[00:59:08] some strategies that are super good and we'll try to do local search in these to
[00:59:11] we'll try to do local search in these to get better and better but you might have
[00:59:13] get better and better but you might have another minimum that is better than this
[00:59:16] another minimum that is better than this one and you don't explore it using the
[00:59:19] one and you don't explore it using the algorithm which currently have there is
[00:59:20] algorithm which currently have there is no trade-off between exploitation
[00:59:22] no trade-off between exploitation exploration we are almost doing only
[00:59:24] exploration we are almost doing only exploitation so how to incentivize this
[00:59:27] exploitation so how to incentivize this exploration you guys have an idea
[00:59:45] so right now when we're in a state as
[00:59:48] so right now when we're in a state as we're for propagating the state process
[00:59:51] we're for propagating the state process states in the network and we take the
[00:59:52] states in the network and we take the action that is the best action always so
[00:59:55] action that is the best action always so we exploiting we're exploiting what we
[00:59:57] we exploiting we're exploiting what we already know we take the best action
[00:59:59] already know we take the best action instead of taking this best action what
[01:00:02] instead of taking this best action what can we do yep Monte Carlo sampling we
[01:00:10] can we do yep Monte Carlo sampling we point another one you wanted to try
[01:00:11] point another one you wanted to try something else get out of her and there
[01:00:13] something else get out of her and there that's the ratio times e take the best
[01:00:16] that's the ratio times e take the best action versus exploring another action
[01:00:18] action versus exploring another action okay take a hyper parameter that tells
[01:00:21] okay take a hyper parameter that tells you when you can explore when you can
[01:00:23] you when you can explore when you can exploit that what you mean yeah that's a
[01:00:27] exploit that what you mean yeah that's a good point so I think that's that's a
[01:00:29] good point so I think that's that's a solution you can take a hyper parameter
[01:00:31] solution you can take a hyper parameter that is a probability telling you with
[01:00:34] that is a probability telling you with this probability Explorer otherwise with
[01:00:37] this probability Explorer otherwise with one - is this probability exploit that's
[01:00:40] one - is this probability exploit that's what that's what we're going to do
[01:00:42] what that's what we're going to do so let's look why exploration versus
[01:00:44] so let's look why exploration versus exploitation doesn't work we're in
[01:00:46] exploitation doesn't work we're in initial state 1 s 1 and we have three
[01:00:49] initial state 1 s 1 and we have three options either we go using action a1 2 s
[01:00:52] options either we go using action a1 2 s 2 and we get reward of 0 or we go to
[01:00:54] 2 and we get reward of 0 or we go to action use action to get to s 3 and get
[01:00:58] action use action to get to s 3 and get reward of 1 or use action 3 and go to s
[01:01:02] reward of 1 or use action 3 and go to s 4 and get a reward of 1,000 so this is
[01:01:06] 4 and get a reward of 1,000 so this is obviously where we want to go we want to
[01:01:08] obviously where we want to go we want to go to s 4 because it has the maximum
[01:01:10] go to s 4 because it has the maximum reward and we don't need to do much
[01:01:12] reward and we don't need to do much computation in our head it's simple
[01:01:14] computation in our head it's simple there is no discount it's direct just
[01:01:17] there is no discount it's direct just after initializing the Q networks you
[01:01:19] after initializing the Q networks you get the following Q values for
[01:01:21] get the following Q values for propagates s1 induction network and get
[01:01:25] propagates s1 induction network and get 0.5 for taking action 1.4 for taking
[01:01:28] 0.5 for taking action 1.4 for taking action 2.3 for texting action 3 so this
[01:01:33] action 2.3 for texting action 3 so this is obviously not good but our networking
[01:01:35] is obviously not good but our networking was randomly initialized what it's
[01:01:37] was randomly initialized what it's telling us is that 0.5 is the maximum so
[01:01:42] telling us is that 0.5 is the maximum so we should take action 1 so let's go take
[01:01:44] we should take action 1 so let's go take action 1 observe s2 you observe a reward
[01:01:47] action 1 observe s2 you observe a reward of 0 our targets because it's a terminal
[01:01:49] of 0 our targets because it's a terminal state is only equal to the reward there
[01:01:52] state is only equal to the reward there is no additional term so we want our
[01:01:55] is no additional term so we want our target to match our queue our target is
[01:01:57] target to match our queue our target is 0 so Q should match zero
[01:01:59] 0 so Q should match zero so we train and we get the cue that
[01:02:01] so we train and we get the cue that should be zero that make sense now we do
[01:02:07] should be zero that make sense now we do another round of iteration we look we're
[01:02:10] another round of iteration we look we're in s1 we get back to the beginning of
[01:02:12] in s1 we get back to the beginning of the episode we see that our cue function
[01:02:15] the episode we see that our cue function tells us that action 2 is the best
[01:02:16] tells us that action 2 is the best because point 4 is the maximum value it
[01:02:20] because point 4 is the maximum value it means go to s3 I go to s3 I observe
[01:02:24] means go to s3 I go to s3 I observe reward of 1 what does it mean it's a
[01:02:27] reward of 1 what does it mean it's a terminal state so my target is 1 y
[01:02:29] terminal state so my target is 1 y equals 1 I want the cue to match my
[01:02:32] equals 1 I want the cue to match my white
[01:02:32] white so my Q should be 1 now I continue third
[01:02:37] so my Q should be 1 now I continue third step up q function says go to a 2 I go
[01:02:41] step up q function says go to a 2 I go to a 2 nothing happens I already matched
[01:02:43] to a 2 nothing happens I already matched the reward for step go to a 2 you see
[01:02:48] the reward for step go to a 2 you see what happens we will never go there we
[01:02:50] what happens we will never go there we will never get there because we're not
[01:02:52] will never get there because we're not exploring so instead of doing that what
[01:02:55] exploring so instead of doing that what we're saying is that 5% of the time
[01:02:58] we're saying is that 5% of the time take a random action to explore and 95%
[01:03:01] take a random action to explore and 95% of the time follow your exploitation ok
[01:03:06] of the time follow your exploitation ok so that's where we add it we probably
[01:03:08] so that's where we add it we probably see Epsilon the hyper parameter take
[01:03:10] see Epsilon the hyper parameter take random action a otherwise do what we
[01:03:14] random action a otherwise do what we were doing before exploit does that make
[01:03:17] were doing before exploit does that make sense ok cool so now we plugged in all
[01:03:22] sense ok cool so now we plugged in all these tricks in our pseudocode and this
[01:03:24] these tricks in our pseudocode and this is our new studio code so we have to
[01:03:26] is our new studio code so we have to initialize the replay memory which we
[01:03:28] initialize the replay memory which we didn't have to do earlier in blue you
[01:03:30] didn't have to do earlier in blue you can find the replay memory added lines
[01:03:32] can find the replay memory added lines in orange you can find the added lines
[01:03:35] in orange you can find the added lines for checking the terminal state and in
[01:03:37] for checking the terminal state and in purple you can find the added line is
[01:03:40] purple you can find the added line is related to Epsilon greedy exploration
[01:03:44] related to Epsilon greedy exploration versus exploitation and finally in bold
[01:03:47] versus exploitation and finally in bold the pre-processing
[01:03:50] the pre-processing any questions on that so that that's
[01:03:54] any questions on that so that that's that's we wanted to see a variant of how
[01:03:57] that's we wanted to see a variant of how deep learning can be used in the setting
[01:04:00] deep learning can be used in the setting that is not necessarily classic
[01:04:01] that is not necessarily classic supervised learning setting
[01:04:06] and you see that the main advantage of
[01:04:08] and you see that the main advantage of deep learning in this case is it's a
[01:04:10] deep learning in this case is it's a good function approximator the
[01:04:12] good function approximator the convolutional neural network can extract
[01:04:13] convolutional neural network can extract a lot of information from the pixels
[01:04:15] a lot of information from the pixels that we were not able to get with other
[01:04:18] that we were not able to get with other networks okay so let's let's see what we
[01:04:22] networks okay so let's let's see what we have here we have our super battery but
[01:04:26] have here we have our super battery but that's gonna dig a tunnel and it's going
[01:04:28] that's gonna dig a tunnel and it's going to destroy all the bricks super quickly
[01:04:32] it's good to see that after building it
[01:04:34] it's good to see that after building it like so this is work from deep mines
[01:04:37] like so this is work from deep mines team and you can find this video on
[01:04:39] team and you can find this video on YouTube okay another thing I wanted to
[01:04:43] YouTube okay another thing I wanted to say quickly is what's the difference
[01:04:44] say quickly is what's the difference between weed and without human knowledge
[01:04:46] between weed and without human knowledge you will see a lot of people a lot of
[01:04:48] you will see a lot of people a lot of papers mentioning that this algorithm
[01:04:51] papers mentioning that this algorithm was trained with human learned knowledge
[01:04:53] was trained with human learned knowledge or this algorithm was trained without
[01:04:55] or this algorithm was trained without any human in the loop
[01:04:56] any human in the loop why is human knowledge very important
[01:05:00] why is human knowledge very important like think about it just playing one
[01:05:03] like think about it just playing one game as a human and teaching that the
[01:05:05] game as a human and teaching that the algorithm will help the algorithm a lot
[01:05:07] algorithm will help the algorithm a lot when the algorithm sees this game what
[01:05:11] when the algorithm sees this game what it sees its pixels what we see when we
[01:05:15] it sees its pixels what we see when we see that game we see that there is a key
[01:05:17] see that game we see that there is a key here we know the key is usually a good
[01:05:19] here we know the key is usually a good thing so we have a lot of context right
[01:05:21] thing so we have a lot of context right as a human we know I'm probably gonna go
[01:05:24] as a human we know I'm probably gonna go for the key I'm not gonna go for this
[01:05:25] for the key I'm not gonna go for this this thing no same ladder
[01:05:28] this thing no same ladder what is the ladder we directly identify
[01:05:30] what is the ladder we directly identify that the ladder is something we can go
[01:05:32] that the ladder is something we can go up and down we identified that this rope
[01:05:35] up and down we identified that this rope is probably something I can use to jump
[01:05:36] is probably something I can use to jump from one side to the other so as a human
[01:05:38] from one side to the other so as a human there is a lot more background
[01:05:40] there is a lot more background information that we have even without
[01:05:41] information that we have even without knowing it without realizing it so
[01:05:44] knowing it without realizing it so there's a huge difference between
[01:05:45] there's a huge difference between algorithms trained with
[01:05:47] algorithms trained with human-in-the-loop and without human in
[01:05:49] human-in-the-loop and without human in the loop this game is actually Montezuma
[01:05:51] the loop this game is actually Montezuma revenge the dqn algorithm when the paper
[01:05:54] revenge the dqn algorithm when the paper came out on underneath on nature in
[01:05:56] came out on underneath on nature in nature the second the second version of
[01:05:58] nature the second the second version of the paper they showed that they beat
[01:06:00] the paper they showed that they beat human on 49 games that are the same type
[01:06:03] human on 49 games that are the same type of games I as break out but this one was
[01:06:05] of games I as break out but this one was the hardest one so they couldn't beat
[01:06:08] the hardest one so they couldn't beat human on this one and the reason was
[01:06:11] human on this one and the reason was because there's a lot of information and
[01:06:13] because there's a lot of information and also the game has is very long
[01:06:17] also the game has is very long so in order it's called Montezuma
[01:06:18] so in order it's called Montezuma revenge and I think ranting pyramids is
[01:06:21] revenge and I think ranting pyramids is going to talk about it a little later
[01:06:22] going to talk about it a little later but in order to get to win this game you
[01:06:25] but in order to get to win this game you have to go through a lot of different
[01:06:27] have to go through a lot of different stages and it's super long so it's super
[01:06:30] stages and it's super long so it's super hard for the algorithm to explore all
[01:06:33] hard for the algorithm to explore all the state space okay so that said I will
[01:06:38] the state space okay so that said I will show you a few more games that that the
[01:06:41] show you a few more games that that the deepmind team has solved pong is one
[01:06:43] deepmind team has solved pong is one sequence is another one and space
[01:06:46] sequence is another one and space invaders that you might know which which
[01:06:48] invaders that you might know which which is probably the most famous of the three
[01:06:50] is probably the most famous of the three Juno okay so that said I'm gonna hand in
[01:06:56] Juno okay so that said I'm gonna hand in the microphone to we're lucky to have an
[01:06:59] the microphone to we're lucky to have an oral expert so Rammstein terawatts is a
[01:07:02] oral expert so Rammstein terawatts is a fourth-year PhD students in RL working
[01:07:06] fourth-year PhD students in RL working with professor Bernstein at Stanford and
[01:07:08] with professor Bernstein at Stanford and he will tell us a little bit about his
[01:07:11] he will tell us a little bit about his experience and he will show us some
[01:07:12] experience and he will show us some advanced applications of deep learning
[01:07:14] advanced applications of deep learning and RL and how these plug in together
[01:07:18] and RL and how these plug in together thank you thanks Cal for that
[01:07:20] thank you thanks Cal for that introduction
[01:07:21] introduction oh yeah can everyone hear me now all
[01:07:24] oh yeah can everyone hear me now all right good cool okay first I have like
[01:07:29] right good cool okay first I have like eight nine minutes I have more okay okay
[01:07:34] eight nine minutes I have more okay okay first question after seeing that lecture
[01:07:38] first question after seeing that lecture so far look how many are you're thinking
[01:07:41] so far look how many are you're thinking that RL is actually cool look honestly
[01:07:43] that RL is actually cool look honestly that's like oh that's a lot
[01:07:46] that's like oh that's a lot oh yeah that's a lot okay my hope is
[01:07:50] oh yeah that's a lot okay my hope is after showing you some other advanced
[01:07:52] after showing you some other advanced topics ears then the percentage got even
[01:07:54] topics ears then the percentage got even increase so let's let's see it's almost
[01:07:59] increase so let's let's see it's almost impossible to talk about like
[01:08:01] impossible to talk about like advancement RL like recently without
[01:08:03] advancement RL like recently without mentioning alphago
[01:08:04] mentioning alphago I think somewhere right now who wrote
[01:08:06] I think somewhere right now who wrote that on a table that it's almost 10 to
[01:08:10] that on a table that it's almost 10 to the power 170 different configuration of
[01:08:13] the power 170 different configuration of the board and that's roughly more than I
[01:08:17] the board and that's roughly more than I mean that's more than the estimated
[01:08:19] mean that's more than the estimated number of atoms in the universe so one
[01:08:21] number of atoms in the universe so one traditional algorithm before the deep
[01:08:24] traditional algorithm before the deep learning and stuff like that was like
[01:08:25] learning and stuff like that was like three searching RL which is basically go
[01:08:29] three searching RL which is basically go exhaustively search all the
[01:08:30] exhaustively search all the a possible action that you can take and
[01:08:32] a possible action that you can take and they'll take the best one in that
[01:08:34] they'll take the best one in that situation also good that's all almost
[01:08:36] situation also good that's all almost impossible so what they do that's also a
[01:08:39] impossible so what they do that's also a paper from deep mind is that they train
[01:08:43] paper from deep mind is that they train anyone Ezra for that they kind of
[01:08:45] anyone Ezra for that they kind of marriage the tree search we do a bit
[01:08:48] marriage the tree search we do a bit different and neural network that they
[01:08:50] different and neural network that they have they have two kinds of networks one
[01:08:53] have they have two kinds of networks one is called value network and value
[01:08:55] is called value network and value network is basically consuming this
[01:08:57] network is basically consuming this image image of a board and telling you
[01:09:01] image image of a board and telling you what's the probability that if you can
[01:09:03] what's the probability that if you can win in this situation so if the value is
[01:09:06] win in this situation so if the value is higher than the probability of winning
[01:09:08] higher than the probability of winning is higher how does it help you he help
[01:09:11] is higher how does it help you he help you in the case that if you want to
[01:09:12] you in the case that if you want to search for the action you don't have to
[01:09:14] search for the action you don't have to go until the end of the game because the
[01:09:15] go until the end of the game because the end of the game is a lot of steps and
[01:09:17] end of the game is a lot of steps and it's almost impossible to go to the end
[01:09:19] it's almost impossible to go to the end of the game in all the simulations so
[01:09:21] of the game in all the simulations so that just helps you to understand what's
[01:09:23] that just helps you to understand what's the value of each game like beforehand
[01:09:25] the value of each game like beforehand like after look for these simple 50s
[01:09:26] like after look for these simple 50s that if you're gonna win that game or if
[01:09:28] that if you're gonna win that game or if you're gonna lose that game there's
[01:09:29] you're gonna lose that game there's another a network of the policy Network
[01:09:32] another a network of the policy Network which helps you to take action but I
[01:09:34] which helps you to take action but I think the most interesting thing of the
[01:09:37] think the most interesting thing of the Alpha goal is that it's trained from
[01:09:39] Alpha goal is that it's trained from scratch so it's trance from nothing and
[01:09:42] scratch so it's trance from nothing and if they have a tree called self play
[01:09:45] if they have a tree called self play that there is two AI playing with each
[01:09:49] that there is two AI playing with each other the best one I replicate the best
[01:09:51] other the best one I replicate the best the best one I can keep it fixed and I
[01:09:54] the best one I can keep it fixed and I have another one that is trying to cop
[01:09:56] have another one that is trying to cop beat the previous version of itself and
[01:09:58] beat the previous version of itself and after it complete the previous version
[01:10:00] after it complete the previous version of itself like you reliably many times
[01:10:02] of itself like you reliably many times then I replace this again for the
[01:10:04] then I replace this again for the previous part and then I just said so
[01:10:06] previous part and then I just said so this is a training curve of like a self
[01:10:08] this is a training curve of like a self a self play of the alphago as you see
[01:10:10] a self play of the alphago as you see and it takes a lot of compute so that's
[01:10:13] and it takes a lot of compute so that's kind of crazy but finally they beat the
[01:10:16] kind of crazy but finally they beat the human okay another type of algorithm but
[01:10:21] human okay another type of algorithm but this is like the whole different class
[01:10:23] this is like the whole different class of algorithm called a policy gradients
[01:10:26] of algorithm called a policy gradients I've developed an algorithm called trust
[01:10:29] I've developed an algorithm called trust region policy yeah can I stop the this
[01:10:32] region policy yeah can I stop the this method residual during locomotion
[01:10:34] method residual during locomotion controllers procedure can you use this
[01:10:36] controllers procedure can you use this out please
[01:10:37] out please okay great so policy gated algorithm
[01:10:45] but I can do is that it stop this from
[01:10:47] but I can do is that it stop this from here that is not episode work okay so
[01:10:53] here that is not episode work okay so here like in the diction that you have
[01:10:55] here like in the diction that you have seen you you came and like compute a Q
[01:11:00] seen you you came and like compute a Q value of HS state and then what you have
[01:11:02] value of HS state and then what you have done is that you take the arc max of
[01:11:04] done is that you take the arc max of this with respect to action and then you
[01:11:06] this with respect to action and then you choose the action that you want to
[01:11:07] choose the action that you want to choose right but what care at the end of
[01:11:10] choose right but what care at the end of the day is the action which is the
[01:11:11] the day is the action which is the mapping from a state to action which if
[01:11:14] mapping from a state to action which if we call it a policy right so what you
[01:11:17] we call it a policy right so what you can at the end of the day is actually
[01:11:18] can at the end of the day is actually the policy like what action should I
[01:11:19] the policy like what action should I take is not really Q value itself right
[01:11:21] take is not really Q value itself right so this clatters a class of methods that
[01:11:24] so this clatters a class of methods that call the policy gradients is trying to
[01:11:26] call the policy gradients is trying to directly optimize for the policy so
[01:11:29] directly optimize for the policy so rather than updating the Q function I
[01:11:31] rather than updating the Q function I compute the gradient of my policy I
[01:11:34] compute the gradient of my policy I update my policy network again and again
[01:11:36] update my policy network again and again and again so let's see these videos so
[01:11:40] and again so let's see these videos so this is like this guy that is trying to
[01:11:44] this is like this guy that is trying to reach the pink ball over there and
[01:11:47] reach the pink ball over there and sometimes that gets hit by the some
[01:11:49] sometimes that gets hit by the some external forces and if it's called the
[01:11:53] external forces and if it's called the algorithm product EPO obviously policy
[01:11:56] algorithm product EPO obviously policy gradient and try to reach to that ball
[01:11:58] gradient and try to reach to that ball so I think that you've heard of open a I
[01:12:02] so I think that you've heard of open a I like five like the but that is putting
[01:12:05] like five like the but that is putting dota so this is like completely like PPO
[01:12:09] dota so this is like completely like PPO algorithm and they have like a lot of
[01:12:12] algorithm and they have like a lot of compute to showing that and I guess I
[01:12:15] compute to showing that and I guess I have the numbers here there is like 180
[01:12:19] have the numbers here there is like 180 years of play in one day this is how
[01:12:22] years of play in one day this is how much code could be so that's why there
[01:12:27] much code could be so that's why there is another even funnier video right
[01:12:33] is another even funnier video right again the same idea it's conjugate yet
[01:12:35] again the same idea it's conjugate yet is that you put two asian in front of
[01:12:38] is that you put two asian in front of each other and they try to beat each
[01:12:40] each other and they try to beat each other and if they beat each other they
[01:12:42] other and if they beat each other they give everyone their the most interesting
[01:12:46] give everyone their the most interesting part is that for example in that game
[01:12:48] part is that for example in that game the purpose is just to pull the other
[01:12:51] the purpose is just to pull the other one out right but they understand some
[01:12:54] one out right but they understand some emerging behave
[01:12:55] emerging behave yeah which is it for us human makes
[01:12:59] yeah which is it for us human makes sense but for them to learn out of
[01:13:01] sense but for them to learn out of nothing is kind of cool so there's like
[01:13:15] nothing is kind of cool so there's like one risk here that when they're playing
[01:13:17] one risk here that when they're playing okay this guy's trying to kick the ball
[01:13:19] okay this guy's trying to kick the ball inside but one risk here is to overfit
[01:13:27] that's also cool again by technical
[01:13:36] that's also cool again by technical point before move but I got 28 is that
[01:13:38] point before move but I got 28 is that here but where is that not the next one
[01:13:41] here but where is that not the next one okay here that - OH - agent playing with
[01:13:45] okay here that - OH - agent playing with each other and we are just updating the
[01:13:47] each other and we are just updating the person with the best other agent
[01:13:49] person with the best other agent previously we are doing a self play is
[01:13:52] previously we are doing a self play is that you over fit to the actual agent
[01:13:54] that you over fit to the actual agent that you're in front of you so the agent
[01:13:56] that you're in front of you so the agent in front of you is powerful but you
[01:13:58] in front of you is powerful but you might over fit to this and if I put the
[01:14:01] might over fit to this and if I put the agent that is not that powerful but is
[01:14:02] agent that is not that powerful but is using a simple trick that the powerful
[01:14:04] using a simple trick that the powerful agent that can never use this then you
[01:14:06] agent that can never use this then you might just lose the game right so one
[01:14:09] might just lose the game right so one trick here to make it more stable is
[01:14:11] trick here to make it more stable is that rather than playing against only
[01:14:13] that rather than playing against only one agent you alternate between
[01:14:16] one agent you alternate between different version of the agent itself so
[01:14:19] different version of the agent itself so it all like learns all the skill
[01:14:20] it all like learns all the skill together it doesn't over fit to the
[01:14:22] together it doesn't over fit to the stuff so there's another thing conduct
[01:14:28] stuff so there's another thing conduct meta learning meta learning is a whole
[01:14:31] meta learning meta learning is a whole different algorithms again and the
[01:14:33] different algorithms again and the purpose is that a lot of tasks are like
[01:14:35] purpose is that a lot of tasks are like similar to each other right the core
[01:14:37] similar to each other right the core example of watching two left and working
[01:14:38] example of watching two left and working - right and like working in the front
[01:14:40] - right and like working in the front direction they're like same test
[01:14:42] direction they're like same test essentially so the point is rather than
[01:14:44] essentially so the point is rather than training on a single test which is like
[01:14:46] training on a single test which is like go left or go right you train a
[01:14:49] go left or go right you train a distribution of tests that are similar
[01:14:51] distribution of tests that are similar to each other and then the idea is that
[01:14:53] to each other and then the idea is that for each a specific task I should learn
[01:14:57] for each a specific task I should learn bit like a very few gradient step so
[01:14:59] bit like a very few gradient step so very few updates should be enough for me
[01:15:01] very few updates should be enough for me so if I learn okay play this videos like
[01:15:05] so if I learn okay play this videos like at the beginning this agent that has
[01:15:07] at the beginning this agent that has been trained with metal
[01:15:08] been trained with metal before it doesn't know how to move but
[01:15:11] before it doesn't know how to move but just look at the number of gradient
[01:15:12] just look at the number of gradient steps like after two or three guillotine
[01:15:14] steps like after two or three guillotine steps
[01:15:15] steps it totally knows how to move that's that
[01:15:17] it totally knows how to move that's that normally takes a lot of steps to train
[01:15:19] normally takes a lot of steps to train but that's only because of the meta
[01:15:21] but that's only because of the meta learning approach that we've used here
[01:15:22] learning approach that we've used here meta learning is also cool I mean the
[01:15:25] meta learning is also cool I mean the algorithm is from Brickley Chelsea Finn
[01:15:27] algorithm is from Brickley Chelsea Finn which is not also coming to Stanford is
[01:15:29] which is not also coming to Stanford is called model agnostic meta learning so
[01:15:34] called model agnostic meta learning so all right another point this very
[01:15:37] all right another point this very interesting game Montezuma revenge that
[01:15:39] interesting game Montezuma revenge that young talk how much time they have yeah
[01:15:44] young talk how much time they have yeah so you've seen a exploration
[01:15:48] so you've seen a exploration exploitation dilemma right so it's it's
[01:15:51] exploitation dilemma right so it's it's it's bad if you don't you're gonna fail
[01:15:54] it's bad if you don't you're gonna fail many times so if you do the exploration
[01:15:57] many times so if you do the exploration scheme that you just saw that this is a
[01:16:02] scheme that you just saw that this is a map of the particular game and you can
[01:16:04] map of the particular game and you can see call it things of that game if you
[01:16:07] see call it things of that game if you like exploration land of it and I think
[01:16:10] like exploration land of it and I think has a twenty one or twenty something
[01:16:12] has a twenty one or twenty something different that it's hard to reach so
[01:16:16] different that it's hard to reach so this recent paper anything from Google
[01:16:18] this recent paper anything from Google brain for mark dilemma and team is
[01:16:20] brain for mark dilemma and team is called
[01:16:21] called unifying the count based methods for
[01:16:23] unifying the count based methods for exploration exploration essentially very
[01:16:25] exploration exploration essentially very hard challenge mostly in the situation
[01:16:28] hard challenge mostly in the situation that the reward is a sparse for exactly
[01:16:30] that the reward is a sparse for exactly in this game but the first reward that
[01:16:32] in this game but the first reward that you get is when you reach the key right
[01:16:34] you get is when you reach the key right from top to here it's almost like two
[01:16:38] from top to here it's almost like two hundred steps and getting the number of
[01:16:41] hundred steps and getting the number of actuals after two hundred steps exactly
[01:16:43] actuals after two hundred steps exactly right but like a random exploration it's
[01:16:45] right but like a random exploration it's almost impossible so you're never gonna
[01:16:47] almost impossible so you're never gonna do that what a very interesting trick
[01:16:50] do that what a very interesting trick here is that you kind of skip account on
[01:16:54] here is that you kind of skip account on how many times you visited a state and
[01:16:56] how many times you visited a state and then if you visit a state that is that
[01:17:02] then if you visit a state that is that has like a few accounts then you give it
[01:17:04] has like a few accounts then you give it a revert to the agent so we call it the
[01:17:06] a revert to the agent so we call it the intrinsic reward so it's kind of makes
[01:17:09] intrinsic reward so it's kind of makes the
[01:17:18] so environment is also intensive ability
[01:17:28] so environment is also intensive ability it has the instant tips to just go etc
[01:17:31] it has the instant tips to just go etc because increase the counts of this
[01:17:34] because increase the counts of this statement has never seen before so this
[01:17:36] statement has never seen before so this gets the nature tries to the experiment
[01:17:38] gets the nature tries to the experiment so it just goes down visit like
[01:17:41] so it just goes down visit like different rooms like them so this game
[01:17:45] different rooms like them so this game is interesting if you certain people
[01:17:47] is interesting if you certain people that to solve the game is huge research
[01:17:51] that to solve the game is huge research online
[01:17:52] online okay the highest of all of the game and
[01:17:54] okay the highest of all of the game and it's just fun also to see the agent play
[01:18:04] [Music]
[01:18:09] any question well right there is also
[01:18:16] any question well right there is also another interesting point that would be
[01:18:22] another interesting point that would be just fun to know about is called
[01:18:24] just fun to know about is called imitation learning imitation learning is
[01:18:27] imitation learning imitation learning is the case that well I mean RL agent so
[01:18:29] the case that well I mean RL agent so sometimes you don't know the revoir like
[01:18:32] sometimes you don't know the revoir like for example in Atari games the revolt is
[01:18:34] for example in Atari games the revolt is the key very well-defined right if I get
[01:18:36] the key very well-defined right if I get the key I get the reward that just
[01:18:38] the key I get the reward that just obvious but sometimes like defining the
[01:18:41] obvious but sometimes like defining the reward is hard for example when the car
[01:18:43] reward is hard for example when the car like the blue one wanna drive in a in
[01:18:46] like the blue one wanna drive in a in some high rail what is the definition of
[01:18:48] some high rail what is the definition of the river right so we don't have a clear
[01:18:50] the river right so we don't have a clear definition of that but on the other hand
[01:18:52] definition of that but on the other hand you have a person like you have human
[01:18:53] you have a person like you have human expert that can drive for us and then we
[01:18:56] expert that can drive for us and then we see oh this is the right way of driving
[01:18:57] see oh this is the right way of driving right so in this situation we have
[01:19:00] right so in this situation we have something called imitation learning that
[01:19:01] something called imitation learning that we try to mimic the behavior of an
[01:19:04] we try to mimic the behavior of an expert so not exactly copying this
[01:19:07] expert so not exactly copying this because if we copy this and then you
[01:19:09] because if we copy this and then you show us it completely different states
[01:19:11] show us it completely different states that we don't know what to do but from
[01:19:13] that we don't know what to do but from now we learn and this is like my example
[01:19:15] now we learn and this is like my example and there's a paper that called
[01:19:18] and there's a paper that called journal that the Soviet imitation
[01:19:19] journal that the Soviet imitation learning which was like from a Stefano's
[01:19:21] learning which was like from a Stefano's group here at Stanford and that was also
[01:19:23] group here at Stanford and that was also interesting well I think that's advanced
[01:19:27] interesting well I think that's advanced topic if you have any questions I'm here
[01:19:29] topic if you have any questions I'm here here
[01:19:30] here put yeah for next week so there's no
[01:19:42] put yeah for next week so there's no assignments you guys not finished
[01:19:44] assignments you guys not finished let's eat bye and you know about speak
[01:19:46] let's eat bye and you know about speak like smuggles now project project these
[01:19:51] like smuggles now project project these partners get 2 turkeys as you know and
[01:19:55] partners get 2 turkeys as you know and there is going to be the project team
[01:19:59] there is going to be the project team mentorships in this Friday we have
[01:20:01] mentorships in this Friday we have connection with reading research papers
[01:20:03] connection with reading research papers we go over the the object detection you
[01:20:07] we go over the the object detection you know and there will be two papers from
[01:20:09] know and there will be two papers from red mouse okay


================================================================================
LECTURE 010
================================================================================

Stanford CS230: Deep Learning | Autumn 2018 | Lecture 10 - Chatbots / Closing Remarks

Source: https://www.youtube.com/watch?v=IFLstgCNOA4

---

Transcript

[00:00:04] so hello everyone and welcome for the
[00:00:08] so hello everyone and welcome for the last lecture of cs2 30 deep learning so
[00:00:12] last lecture of cs2 30 deep learning so it's been 10 weeks and
[00:00:13] it's been 10 weeks and then you've been studying deep learning
[00:00:16] then you've been studying deep learning all around starting with fully connected
[00:00:19] all around starting with fully connected networks understanding how to boost
[00:00:22] networks understanding how to boost these networks and make them better and
[00:00:24] these networks and make them better and then using recurrent neural networks in
[00:00:27] then using recurrent neural networks in the last part and convolutional neural
[00:00:29] the last part and convolutional neural networks in the fourth part to build
[00:00:31] networks in the fourth part to build models for imaging and text and other
[00:00:35] models for imaging and text and other applications so today is the class
[00:00:38] applications so today is the class wrap-up and the lecture might be
[00:00:40] wrap-up and the lecture might be slightly shorter than usual but we're
[00:00:45] slightly shorter than usual but we're going to go over a small case study on
[00:00:48] going to go over a small case study on conversional assistants to start with
[00:00:51] conversional assistants to start with which is a neutral topic we will do a
[00:00:55] which is a neutral topic we will do a small quiz competition with Monty and
[00:00:57] small quiz competition with Monty and the fastest person who has the best
[00:01:01] the fastest person who has the best answer will win 400 hours of GPU credits
[00:01:06] answer will win 400 hours of GPU credits on Amazon
[00:01:08] on Amazon so you guys can can start can start
[00:01:11] so you guys can can start can start working on it
[00:01:13] working on it we will see some class project advice
[00:01:16] we will see some class project advice because you guys have about two weeks
[00:01:19] because you guys have about two weeks less than two weeks before the the
[00:01:21] less than two weeks before the the poster presentation and the final
[00:01:24] poster presentation and the final project due date will also go over some
[00:01:28] project due date will also go over some of the next steps after CS 2:30 what
[00:01:31] of the next steps after CS 2:30 what have our students done over the past
[00:01:33] have our students done over the past year and what we think are good next
[00:01:36] year and what we think are good next steps and closing remarks to finish I'll
[00:01:40] steps and closing remarks to finish I'll by you if you have a clicker with with
[00:01:43] by you if you have a clicker with with bathroom please can you bring it to me
[00:01:46] okay so let's get started with how to
[00:01:49] okay so let's get started with how to build a chatbot to help students find or
[00:01:52] build a chatbot to help students find or and enroll in the right course so this
[00:01:57] and enroll in the right course so this is going to be a pretty simple case of a
[00:01:59] is going to be a pretty simple case of a chat bot because chat BOTS and
[00:02:01] chat bot because chat BOTS and commercial conversational assistants in
[00:02:04] commercial conversational assistants in general have been very hard to build and
[00:02:06] general have been very hard to build and have an initial topic there are some
[00:02:08] have an initial topic there are some places where academia has helped the
[00:02:12] places where academia has helped the chat pods improvements and here we're
[00:02:15] chat pods improvements and here we're going to see how we can take all our
[00:02:17] going to see how we can take all our algorithms what we've learned in this
[00:02:19] algorithms what we've learned in this class and plug it in in a conversional
[00:02:22] class and plug it in in a conversional setting that sounds good so let me give
[00:02:24] setting that sounds good so let me give you an example students might write to
[00:02:28] you an example students might write to the chat bot hi I want to enroll in CS
[00:02:30] the chat bot hi I want to enroll in CS 106a for winter 2019 to learn coding the
[00:02:35] 106a for winter 2019 to learn coding the chat bot can answer for sure I just
[00:02:38] chat bot can answer for sure I just enrolled you so that would be one goal
[00:02:40] enrolled you so that would be one goal of the chat bot a second example might
[00:02:44] of the chat bot a second example might be finding information about classes hi
[00:02:48] be finding information about classes hi what are the undergraduate level history
[00:02:51] what are the undergraduate level history classes offered in spring 2019 then the
[00:02:55] classes offered in spring 2019 then the chat bot can get back to the students
[00:02:56] chat bot can get back to the students and said here's the list of history
[00:02:58] and said here's the list of history classes offered in spring 2008 so we're
[00:03:03] classes offered in spring 2008 so we're making a small assumption here we're
[00:03:05] making a small assumption here we're building a chat bot for a very
[00:03:06] building a chat bot for a very restricted area in general and a lot of
[00:03:10] restricted area in general and a lot of time chat BOTS which work very well our
[00:03:13] time chat BOTS which work very well our super goal-oriented or transactional and
[00:03:17] super goal-oriented or transactional and the state of possible
[00:03:21] the state of possible or requests from users is small smaller
[00:03:24] or requests from users is small smaller than what you could expect in other
[00:03:26] than what you could expect in other industrial settings so here we're making
[00:03:28] industrial settings so here we're making the assumption that the students will
[00:03:30] the assumption that the students will only try to find information about a
[00:03:32] only try to find information about a course or will try to enroll in the
[00:03:35] course or will try to enroll in the course so I want you guys to pair in
[00:03:39] course so I want you guys to pair in groups of two or three and try to come
[00:03:42] groups of two or three and try to come up with ideas of what methods that we've
[00:03:46] up with ideas of what methods that we've seen together can be used in order to
[00:03:49] seen together can be used in order to implement such a chatbot okay so take a
[00:03:52] implement such a chatbot okay so take a minute
[00:03:53] minute introduce yourself to your mates and and
[00:03:55] introduce yourself to your mates and and try to figure out which methods can be
[00:03:58] try to figure out which methods can be leveraged in this case okay let's see
[00:04:03] leveraged in this case okay let's see what we have here parlance for natural
[00:04:06] what we have here parlance for natural language processing transfer learning
[00:04:09] language processing transfer learning and let's seem to pick out important
[00:04:11] and let's seem to pick out important words from inputs based on those input
[00:04:14] words from inputs based on those input triggers output some predefined
[00:04:16] triggers output some predefined information from storage yeah so this
[00:04:19] information from storage yeah so this seems to to say that there is going to
[00:04:23] seems to to say that there is going to be one learning part where we need to
[00:04:26] be one learning part where we need to have probably record neural networks
[00:04:29] have probably record neural networks helping out and one other knowledge base
[00:04:32] helping out and one other knowledge base or storage part where we can retrieve
[00:04:34] or storage part where we can retrieve some information we're going to see that
[00:04:36] some information we're going to see that some attention models is true that today
[00:04:40] some attention models is true that today a lot of natural language processing
[00:04:42] a lot of natural language processing models are built with attention models
[00:04:49] Arnon for speech recognition and speech
[00:04:51] Arnon for speech recognition and speech generation so we didn't talk about the
[00:04:53] generation so we didn't talk about the speech parts so far we assume that the
[00:04:56] speech parts so far we assume that the congressional assistant is text-based
[00:04:58] congressional assistant is text-based but later on we will see what happens if
[00:05:01] but later on we will see what happens if we want to add speech to it fancy
[00:05:08] we want to add speech to it fancy methods or reinforcement learning for
[00:05:13] methods or reinforcement learning for making decisions about responses that's
[00:05:15] making decisions about responses that's interesting so why do you guys think we
[00:05:19] interesting so why do you guys think we would need reinforcement learning yes
[00:05:26] of different states and you also have
[00:05:27] of different states and you also have like a volume associated the music is
[00:05:29] like a volume associated the music is very goal-oriented and so you could sort
[00:05:32] very goal-oriented and so you could sort of have a dress in that fashion yeah
[00:05:33] of have a dress in that fashion yeah that's good
[00:05:34] that's good so just to repeat it's important to keep
[00:05:37] so just to repeat it's important to keep a notion of context and also we have a
[00:05:39] a notion of context and also we have a sequence of utterances from the user and
[00:05:42] sequence of utterances from the user and the commercial assistance and probably
[00:05:46] the commercial assistance and probably the outcome of the conversation would
[00:05:48] the outcome of the conversation would come far along the way and not at every
[00:05:50] come far along the way and not at every step so that's true reinforcement
[00:05:54] step so that's true reinforcement learning has been a research topic for
[00:05:57] learning has been a research topic for commercial assistants as well and often
[00:05:59] commercial assistants as well and often time we will try to learn a policy for
[00:06:02] time we will try to learn a policy for the chat pod which given a state will
[00:06:03] the chat pod which given a state will tell us what action to take next this
[00:06:06] tell us what action to take next this can be done using Q learning which is
[00:06:08] can be done using Q learning which is the method we've seen together or some
[00:06:10] the method we've seen together or some time with policy gradients okay word and
[00:06:15] time with policy gradients okay word and coding so I word embedding probably okay
[00:06:22] coding so I word embedding probably okay cool so I agree there's many ways to to
[00:06:26] cool so I agree there's many ways to to plug in a deep learning algorithm in
[00:06:28] plug in a deep learning algorithm in this chat pod setting we're going to see
[00:06:30] this chat pod setting we're going to see a few of them first I'd like to
[00:06:39] a few of them first I'd like to introduce some vocabulary which is
[00:06:41] introduce some vocabulary which is commonly used when talking about
[00:06:43] commonly used when talking about commercial assistance conversational
[00:06:45] commercial assistance conversational assistance an utterance is you can think
[00:06:48] assistance an utterance is you can think of it as a user input so if I say the
[00:06:50] of it as a user input so if I say the student utterance it's the sentence that
[00:06:53] student utterance it's the sentence that was written by the students for the chat
[00:06:55] was written by the students for the chat pod the assistant utterance is the one
[00:06:59] pod the assistant utterance is the one coming from the chat pod site the intent
[00:07:01] coming from the chat pod site the intent denotes the intention of the user so in
[00:07:04] denotes the intention of the user so in our case we will have two intents which
[00:07:06] our case we will have two intents which is very limited the user either wants to
[00:07:08] is very limited the user either wants to find information from for a course or
[00:07:11] find information from for a course or the user wants to enroll in a class
[00:07:15] the user wants to enroll in a class these are two different intentions that
[00:07:17] these are two different intentions that are probably to be detected early on the
[00:07:20] are probably to be detected early on the conversation and then you have something
[00:07:22] conversation and then you have something called slots slots are used to gather
[00:07:26] called slots slots are used to gather multiple information from the user on a
[00:07:29] multiple information from the user on a specific intent that the user has so
[00:07:32] specific intent that the user has so let's say the students wants to enroll
[00:07:34] let's say the students wants to enroll in a class in order to enroll the
[00:07:36] in a class in order to enroll the students in a class you need to fill in
[00:07:38] students in a class you need to fill in several slots
[00:07:39] several slots you need to understand probably which
[00:07:41] you need to understand probably which class the student is talking about which
[00:07:45] class the student is talking about which quarter the student wants to enroll in
[00:07:47] quarter the student wants to enroll in the class which year is the student
[00:07:49] the class which year is the student talking about and eventually you want to
[00:07:51] talking about and eventually you want to know this su su ID of the students but
[00:07:55] know this su su ID of the students but probably we can assume that the su ID is
[00:07:57] probably we can assume that the su ID is already encoded in a conversation on the
[00:07:59] already encoded in a conversation on the environment we're in so these are three
[00:08:03] environment we're in so these are three vocabulary and we're also going to talk
[00:08:05] vocabulary and we're also going to talk about turns four conversational
[00:08:08] about turns four conversational assistance so so single turn
[00:08:11] assistance so so single turn conversation is when there is just a
[00:08:14] conversation is when there is just a user returns and a response and multi
[00:08:18] user returns and a response and multi turn is when there is several user
[00:08:19] turn is when there is several user utterances and conversational assistant
[00:08:24] utterances and conversational assistant utterances and you understand that Multi
[00:08:28] utterances and you understand that Multi multi utterance conversations are harder
[00:08:30] multi utterance conversations are harder to understand because we need to track
[00:08:32] to understand because we need to track context our assumption today will be
[00:08:36] context our assumption today will be that we work in an environment with
[00:08:37] that we work in an environment with limited intents and slots it means we
[00:08:40] limited intents and slots it means we can define to intents and for each of
[00:08:41] can define to intents and for each of these two intents there are several
[00:08:42] these two intents there are several slots that we want to fill in is going
[00:08:44] slots that we want to fill in is going to make our life easier of course in
[00:08:48] to make our life easier of course in practice you can have multi myriads of
[00:08:51] practice you can have multi myriads of intents and slots and you the task
[00:08:54] intents and slots and you the task becomes more complicated when you have
[00:08:56] becomes more complicated when you have more of those so my first question would
[00:09:00] more of those so my first question would be how to detect the intent based on the
[00:09:06] be how to detect the intent based on the user utterance can you talk about what
[00:09:09] user utterance can you talk about what kind of data set you need to build in
[00:09:11] kind of data set you need to build in order to train a model to detect the
[00:09:12] order to train a model to detect the intent
[00:09:25] or what type of network you need there
[00:09:37] or what type of network you need there is not a single good answer so go for it
[00:09:40] is not a single good answer so go for it it's your brain zone so I think there's
[00:09:43] it's your brain zone so I think there's there's gonna be two options obviously
[00:09:46] there's gonna be two options obviously because we have a we have a sequence
[00:09:47] because we have a we have a sequence coming in which is the user input we
[00:09:50] coming in which is the user input we might want to use the recurrent neural
[00:09:51] might want to use the recurrent neural network to encode long term dependencies
[00:09:54] network to encode long term dependencies or you might wanna use a convolutional
[00:09:56] or you might wanna use a convolutional net work actually convolutional networks
[00:10:00] net work actually convolutional networks have some benefits that's recurrent
[00:10:02] have some benefits that's recurrent neural networks don't have and they they
[00:10:04] neural networks don't have and they they might work better for example if the
[00:10:06] might work better for example if the intent we're looking for is always
[00:10:08] intent we're looking for is always encoded in a small number of words
[00:10:10] encoded in a small number of words somewhere in the input sequence because
[00:10:13] somewhere in the input sequence because you will have a filter scanning that and
[00:10:15] you will have a filter scanning that and the filter can detect the intent so if
[00:10:17] the filter can detect the intent so if you have a filter that was trained in
[00:10:19] you have a filter that was trained in order to detect the intent inform
[00:10:21] order to detect the intent inform another filter trained to detect the
[00:10:23] another filter trained to detect the intent and roll then these two filter
[00:10:26] intent and roll then these two filter will detect the word enroll or the word
[00:10:29] will detect the word enroll or the word I'm looking for and so on in order to
[00:10:31] I'm looking for and so on in order to detect the intent okay in terms of data
[00:10:35] detect the intent okay in terms of data what you probably need is pairs of user
[00:10:38] what you probably need is pairs of user utterances along with the intent of the
[00:10:41] utterances along with the intent of the user so you would need to label the
[00:10:44] user so you would need to label the datasets like this one with X and input
[00:10:46] datasets like this one with X and input I want to so it's padded I want to
[00:10:48] I want to so it's padded I want to enroll in CS 106a for winter 2019 to
[00:10:50] enroll in CS 106a for winter 2019 to learn coding and this you will label it
[00:10:53] learn coding and this you will label it as enroll and notice that enroll here is
[00:10:57] as enroll and notice that enroll here is a function so the label is actually
[00:11:00] a function so the label is actually noted as a function and the reason is
[00:11:02] noted as a function and the reason is because we can call this function in
[00:11:04] because we can call this function in order to issue information
[00:11:06] order to issue information another example is hi what are the
[00:11:08] another example is hi what are the undergraduate level history classes
[00:11:09] undergraduate level history classes offered in spring 2018 and this would be
[00:11:11] offered in spring 2018 and this would be label as in form so it's probably a two
[00:11:15] label as in form so it's probably a two class classification or three classes if
[00:11:17] class classification or three classes if you want to add a third class that
[00:11:20] you want to add a third class that corresponds to other intents a user
[00:11:23] corresponds to other intents a user might want to use this chat bot for
[00:11:25] might want to use this chat bot for another intent that the chat bar wasn't
[00:11:27] another intent that the chat bar wasn't built for so these are the
[00:11:30] built for so these are the classes enroll in inform and what's
[00:11:32] classes enroll in inform and what's interesting is that if we identify that
[00:11:34] interesting is that if we identify that the intent of the user is enroll we
[00:11:37] the intent of the user is enroll we probably want to call an API or to
[00:11:39] probably want to call an API or to request information from another server
[00:11:41] request information from another server and in this case it might be access
[00:11:43] and in this case it might be access because the the platform we use to
[00:11:45] because the the platform we use to enroll in classes is access and same to
[00:11:49] enroll in classes is access and same to retrieve information in order to help
[00:11:50] retrieve information in order to help the user about their classes we can
[00:11:52] the user about their classes we can probably call explore courses assuming
[00:11:55] probably call explore courses assuming that these these services have api's
[00:12:00] that these these services have api's these surfaces have api's does that make
[00:12:03] these surfaces have api's does that make sense and now the interesting part is
[00:12:06] sense and now the interesting part is that the unroll function might request
[00:12:08] that the unroll function might request some inputs that you have to identify
[00:12:12] some inputs that you have to identify those will be the slots same for the
[00:12:14] those will be the slots same for the inform function okay so we could train a
[00:12:19] inform function okay so we could train a sequence classifier either convolutional
[00:12:21] sequence classifier either convolutional or record and this we're not going to go
[00:12:24] or record and this we're not going to go into the details you've learnt it in the
[00:12:25] into the details you've learnt it in the sequence models class how to detect the
[00:12:29] sequence models class how to detect the slots now so in terms of data it's going
[00:12:34] slots now so in terms of data it's going to look very similar to the previous one
[00:12:36] to look very similar to the previous one but we will have a sequence to sequence
[00:12:37] but we will have a sequence to sequence problem now where the user utterance
[00:12:40] problem now where the user utterance will be a sequence of words and the
[00:12:43] will be a sequence of words and the slots tag will also be a sequence so for
[00:12:46] slots tag will also be a sequence so for example
[00:12:46] example show me the Tuesday fifth of December
[00:12:50] show me the Tuesday fifth of December flights from paris to kuala lumpur if
[00:12:52] flights from paris to kuala lumpur if you were to build conversational
[00:12:55] you were to build conversational assistance for flights booking then the
[00:13:00] assistance for flights booking then the label you want to have is probably
[00:13:01] label you want to have is probably something like that doesn't have to be
[00:13:03] something like that doesn't have to be exactly this but why they note zero for
[00:13:07] exactly this but why they note zero for some of the words the sequence is be day
[00:13:10] some of the words the sequence is be day by day or be dep be are are what do you
[00:13:17] by day or be dep be are are what do you think these correspond to and why do we
[00:13:18] think these correspond to and why do we need that we've probably you've probably
[00:13:23] need that we've probably you've probably seen that in in the sections a few weeks
[00:13:25] seen that in in the sections a few weeks back
[00:13:30] so why do we denote these labels in a
[00:13:34] so why do we denote these labels in a certain format and then the other one
[00:13:51] yeah yeah correct so I agree with what
[00:13:56] yeah yeah correct so I agree with what you said for day they departure arrival
[00:13:59] you said for day they departure arrival arrival so these words are encoding they
[00:14:01] arrival so these words are encoding they departure and arrival how about the B
[00:14:03] departure and arrival how about the B and the I and the O someone has an idea
[00:14:09] and the I and the O someone has an idea is a beginning sometimes these things
[00:14:12] is a beginning sometimes these things are moving more water yeah exactly
[00:14:14] are moving more water yeah exactly please be be the notes beginning while I
[00:14:18] please be be the notes beginning while I the notes in or inside and O out or
[00:14:21] the notes in or inside and O out or output general so what happens here is
[00:14:25] output general so what happens here is that sometimes you would have a slot
[00:14:27] that sometimes you would have a slot which might be filled by several words
[00:14:29] which might be filled by several words and not a single word and you want to be
[00:14:31] and not a single word and you want to be able to detect this entire chunk it's
[00:14:34] able to detect this entire chunk it's called chunking so you would use a
[00:14:37] called chunking so you would use a special encoding in order to identify if
[00:14:40] special encoding in order to identify if this word is the beginning of a word
[00:14:42] this word is the beginning of a word that you want to fill in the slot or is
[00:14:45] that you want to fill in the slot or is the end or inside or out of the word you
[00:14:48] the end or inside or out of the word you want to fill in the slot and then they
[00:14:50] want to fill in the slot and then they departure and arrival or three possible
[00:14:52] departure and arrival or three possible slots that we want to fill in in order
[00:14:54] slots that we want to fill in in order to be able to book the flight if you
[00:14:57] to be able to book the flight if you don't receive these slots you might want
[00:14:59] don't receive these slots you might want to have your chat BOTS request these
[00:15:01] to have your chat BOTS request these slots later okay so another example in
[00:15:06] slots later okay so another example in classes here can be daily departure
[00:15:08] classes here can be daily departure arrival class like you want to travel in
[00:15:11] arrival class like you want to travel in echo or business a number of passenger
[00:15:14] echo or business a number of passenger that you want to have on your flight if
[00:15:17] that you want to have on your flight if we were for our chat but here it would
[00:15:20] we were for our chat but here it would be hi I want to enroll in CS 106 3a for
[00:15:22] be hi I want to enroll in CS 106 3a for winter 2019 to learn coding and we will
[00:15:24] winter 2019 to learn coding and we will encode it by the beginning of the code
[00:15:27] encode it by the beginning of the code of the class beginning of the quarter
[00:15:29] of the class beginning of the quarter and beginning of the year that would be
[00:15:32] and beginning of the year that would be a possible encoding and then you will
[00:15:34] a possible encoding and then you will train using a probably a recurrent
[00:15:38] train using a probably a recurrent neural network and algorithm to predict
[00:15:40] neural network and algorithm to predict all the
[00:15:40] all the tags that make sense so now we have
[00:15:45] tags that make sense so now we have already two models that are running on
[00:15:48] already two models that are running on our chat bots one that is for the
[00:15:50] our chat bots one that is for the intense and one that is for the tags
[00:15:56] what do you think about joint training
[00:15:59] what do you think about joint training you think it's something we could do
[00:16:01] you think it's something we could do here and what do I mean by joint
[00:16:05] here and what do I mean by joint training
[00:16:15] yep trading on all the different codes
[00:16:18] yep trading on all the different codes like training for the defect border year
[00:16:20] like training for the defect border year and class are training in separate can
[00:16:23] and class are training in separate can work to each of themselves like the
[00:16:24] work to each of themselves like the joint element of the Train not training
[00:16:28] joint element of the Train not training for different codes no I was talking
[00:16:31] for different codes no I was talking more about training for different tasks
[00:16:33] more about training for different tasks so infants and intent for enrolling
[00:16:36] so infants and intent for enrolling intent from intent and and and slots
[00:16:39] intent from intent and and and slots tagging and here we have one intent
[00:16:43] tagging and here we have one intent classifier which takes an input sequence
[00:16:45] classifier which takes an input sequence and outputs a single class and we have a
[00:16:48] and outputs a single class and we have a slot tiger which takes the same input
[00:16:53] slot tiger which takes the same input exactly the same input and tags every
[00:16:56] exactly the same input and tags every single word in the sequence so probably
[00:16:58] single word in the sequence so probably we can use joint training in order to
[00:17:00] we can use joint training in order to train one network that might be able to
[00:17:02] train one network that might be able to do both and this network will be jointly
[00:17:05] do both and this network will be jointly trained with two different last
[00:17:06] trained with two different last functions one for the intent and one for
[00:17:08] functions one for the intent and one for the slaughter
[00:17:09] the slaughter it's usually helpful to jointly train
[00:17:13] it's usually helpful to jointly train two networks especially in the earlier
[00:17:15] two networks especially in the earlier layers because you end up learning the
[00:17:17] layers because you end up learning the same type of features that's that's
[00:17:20] same type of features that's that's interesting for natural language
[00:17:21] interesting for natural language processing there is it yes boss function
[00:17:25] processing there is it yes boss function for them is it calculate both losses and
[00:17:28] for them is it calculate both losses and something together or is there a
[00:17:30] something together or is there a trade-off between five minutes versus
[00:17:33] trade-off between five minutes versus five seasons so the question is how
[00:17:35] five seasons so the question is how would you describe the loss function in
[00:17:37] would you describe the loss function in this joint training it was actually some
[00:17:39] this joint training it was actually some two loss functions the two loss
[00:17:41] two loss functions the two loss functions you are using you would just
[00:17:42] functions you are using you would just sum them and hope that's the
[00:17:44] sum them and hope that's the backpropagation will train actually both
[00:17:46] backpropagation will train actually both networks and the networks will probably
[00:17:48] networks and the networks will probably have a common base and then we'd be
[00:17:51] have a common base and then we'd be separated after so let's say you have a
[00:17:53] separated after so let's say you have a first lsdm layer that encode some
[00:17:55] first lsdm layer that encode some information about your user utterance
[00:17:59] information about your user utterance then this will give we give its output
[00:18:03] then this will give we give its output to two different networks which will
[00:18:04] to two different networks which will will be trained separately okay and
[00:18:08] will be trained separately okay and classes here are codes for the class
[00:18:10] classes here are codes for the class quarter your NSU ID
[00:18:12] quarter your NSU ID assuming su ID is already in the
[00:18:14] assuming su ID is already in the environment we will not need to request
[00:18:16] environment we will not need to request it so can you tell me how to acquire
[00:18:19] it so can you tell me how to acquire this data now that we've seen it so take
[00:18:23] this data now that we've seen it so take take about a minute to discuss with your
[00:18:25] take about a minute to discuss with your mates how to acquire that type of data
[00:18:29] mates how to acquire that type of data and then answer on Monty okay so let's
[00:18:33] and then answer on Monty okay so let's go over some of the answers Mechanical
[00:18:38] go over some of the answers Mechanical Turk have people manually collect
[00:18:39] Turk have people manually collect annotate the data that's true
[00:18:41] annotate the data that's true so as we discussed earlier in the
[00:18:43] so as we discussed earlier in the quarter this would be the method which
[00:18:45] quarter this would be the method which is probably the more rigorous when it's
[00:18:48] is probably the more rigorous when it's applied with a specific labeling process
[00:18:51] applied with a specific labeling process and data collection process it will take
[00:18:55] and data collection process it will take more time so you would have to build a
[00:18:58] more time so you would have to build a UI user interface for them to be able to
[00:19:02] UI user interface for them to be able to label all these data which is not
[00:19:04] label all these data which is not trivial in general Amazon Mechanical
[00:19:07] trivial in general Amazon Mechanical Turk a large number of Stanford students
[00:19:09] Turk a large number of Stanford students that works have a human chat assistant
[00:19:14] that works have a human chat assistant service user and enter the data in hand
[00:19:17] service user and enter the data in hand labeled data the ITU can start with hand
[00:19:19] labeled data the ITU can start with hand labeling probably can also generate some
[00:19:23] labeling probably can also generate some data by substituting dead courses
[00:19:24] data by substituting dead courses quarter and other tags oh that's a good
[00:19:26] quarter and other tags oh that's a good idea
[00:19:27] idea so who wrote that someone wants to
[00:19:29] so who wrote that someone wants to comment yeah that's a good idea so I
[00:19:45] comment yeah that's a good idea so I repeat for the SCPD students we already
[00:19:48] repeat for the SCPD students we already have a bunch of possible dates we can
[00:19:51] have a bunch of possible dates we can easily find a list of dates you've done
[00:19:53] easily find a list of dates you've done it in one assignment right where you
[00:19:56] it in one assignment right where you were using the neural machine
[00:19:57] were using the neural machine translation to transfer for human
[00:19:59] translation to transfer for human readable dates to machine readable dates
[00:20:01] readable dates to machine readable dates so we have data sets of dates so we
[00:20:04] so we have data sets of dates so we could use that we also have a list of
[00:20:07] could use that we also have a list of courses that we can probably find on
[00:20:09] courses that we can probably find on explore courses we know that they're not
[00:20:13] explore courses we know that they're not too many quarters and and we are we have
[00:20:17] too many quarters and and we are we have probably databases for any other tagil
[00:20:19] probably databases for any other tagil at least of possible su ideas or like
[00:20:21] at least of possible su ideas or like seven figures something like that so all
[00:20:23] seven figures something like that so all numbers of seven figures hopefully and
[00:20:26] numbers of seven figures hopefully and then we can have sentences with like
[00:20:28] then we can have sentences with like blank spots where we insert this and we
[00:20:31] blank spots where we insert this and we can generate a lot of data using this
[00:20:34] can generate a lot of data using this insertion scheme automated and every
[00:20:36] insertion scheme automated and every time we insert we can label we're going
[00:20:38] time we insert we can label we're going to see that
[00:20:41] I like this idea as well use a part of
[00:20:45] I like this idea as well use a part of speech tagger identity recognition model
[00:20:47] speech tagger identity recognition model to identify examples requests that are
[00:20:48] to identify examples requests that are found elsewhere
[00:20:49] found elsewhere so one thing we discussed in section is
[00:20:53] so one thing we discussed in section is that you have available models to do
[00:20:56] that you have available models to do part of speech tagging right so why
[00:20:59] part of speech tagging right so why don't we use them these are trained
[00:21:00] don't we use them these are trained really well and we could give our user
[00:21:03] really well and we could give our user utterances that we collected online and
[00:21:07] utterances that we collected online and tagged them automatically using these
[00:21:09] tagged them automatically using these good models of course it's not going to
[00:21:11] good models of course it's not going to be perfect but we can at least get
[00:21:13] be perfect but we can at least get started with that and leverage a model
[00:21:16] started with that and leverage a model that someone else has built to tag and
[00:21:19] that someone else has built to tag and label our dataset okay good ideas here
[00:21:28] so let's see the data generation process
[00:21:32] so let's see the data generation process which is the most strategy to start with
[00:21:34] which is the most strategy to start with I would say we would have talking about
[00:21:39] I would say we would have talking about the flight booking Virtual Assistants we
[00:21:43] the flight booking Virtual Assistants we would have a database of all the
[00:21:44] would have a database of all the departure locations so whatever
[00:21:48] departure locations so whatever Paris London Kuala Lumpur and a lot of
[00:21:52] Paris London Kuala Lumpur and a lot of arrivals as well so these are lists of
[00:21:55] arrivals as well so these are lists of cities that have airports probably in
[00:21:57] cities that have airports probably in the world and we will have a list of way
[00:22:01] the world and we will have a list of way to write days and also class business
[00:22:04] to write days and also class business echo echo plus premium I don't know
[00:22:07] echo echo plus premium I don't know whatever you want and user occurrences
[00:22:10] whatever you want and user occurrences and then what we will do is that we will
[00:22:11] and then what we will do is that we will pull a user a trends from the database
[00:22:14] pull a user a trends from the database such as this one I would like to book a
[00:22:16] such as this one I would like to book a flight from depth to arrival for in in
[00:22:22] flight from depth to arrival for in in business class let's say in class for
[00:22:24] business class let's say in class for this day and then we can plug in from
[00:22:28] this day and then we can plug in from dataset randomly the slots that make
[00:22:34] dataset randomly the slots that make sense we can generate a lot of data
[00:22:35] sense we can generate a lot of data using this process so this user
[00:22:37] using this process so this user utterance can be augmented in virtually
[00:22:42] utterance can be augmented in virtually tens or hundreds of different
[00:22:44] tens or hundreds of different combinations
[00:22:49] so that's one way to augment your data
[00:22:51] so that's one way to augment your data set automatically and label it but you
[00:22:53] set automatically and label it but you also need hand labelled data because you
[00:22:57] also need hand labelled data because you don't want your model to overfit to this
[00:22:59] don't want your model to overfit to this specific type of user utterances okay
[00:23:04] specific type of user utterances okay and so on so same for our virtual
[00:23:09] and so on so same for our virtual assistant for the for the university hi
[00:23:12] assistant for the for the university hi I want to enroll in code for a quarter
[00:23:14] I want to enroll in code for a quarter year and then we can insert from the
[00:23:17] year and then we can insert from the database the quarter the year and the
[00:23:19] database the quarter the year and the code of different classes so that we can
[00:23:22] code of different classes so that we can train our network on that does this
[00:23:25] train our network on that does this state augmentation make sense so these
[00:23:29] state augmentation make sense so these are common tricks you would seen in
[00:23:30] are common tricks you would seen in various papers and this is an example of
[00:23:33] various papers and this is an example of one of them okay so we can label
[00:23:37] one of them okay so we can label automatically when inserting and we can
[00:23:39] automatically when inserting and we can train a sequence to sequence model in
[00:23:41] train a sequence to sequence model in order to fill in the slots okay so let's
[00:23:47] order to fill in the slots okay so let's go on menti and start the competition
[00:23:50] go on menti and start the competition which is the the most fun okay
[00:23:54] which is the the most fun okay so let's get back to to to where we were
[00:23:57] so let's get back to to to where we were we have a chat bot that is able to
[00:23:59] we have a chat bot that is able to answer for sure I just enrolled you the
[00:24:01] answer for sure I just enrolled you the way it does that is that it receives the
[00:24:03] way it does that is that it receives the user a chance I want to enroll in CS 106
[00:24:05] user a chance I want to enroll in CS 106 a winter 2019 to learn coding it
[00:24:08] a winter 2019 to learn coding it identifies the intent of the user using
[00:24:11] identifies the intent of the user using sequence classifier same type of network
[00:24:14] sequence classifier same type of network as you've built for the mo GFI
[00:24:15] as you've built for the mo GFI assignment and then it also runs another
[00:24:19] assignment and then it also runs another algorithm which will fill in the slots
[00:24:21] algorithm which will fill in the slots and here we have all the slots needed we
[00:24:25] and here we have all the slots needed we have the code for the class we have the
[00:24:26] have the code for the class we have the quarter and we have the year thus unit
[00:24:28] quarter and we have the year thus unit ID is implicitly given so we're able to
[00:24:31] ID is implicitly given so we're able to enroll to enroll the students by calling
[00:24:33] enroll to enroll the students by calling access with all these slots done now
[00:24:36] access with all these slots done now let's make it a little more complicated
[00:24:38] let's make it a little more complicated let's say the students say hi I want to
[00:24:40] let's say the students say hi I want to enroll in CS 106 a 2 to learn coding so
[00:24:45] enroll in CS 106 a 2 to learn coding so the difference between these utterance
[00:24:47] the difference between these utterance and the previous one example one is that
[00:24:49] and the previous one example one is that you don't have all the slots you
[00:24:52] you don't have all the slots you identify with your slots tagger that's
[00:24:55] identify with your slots tagger that's CS 106 a is the coder of the class but
[00:24:57] CS 106 a is the coder of the class but you don't know the culture you don't
[00:24:59] you don't know the culture you don't know the year so you probably want your
[00:25:01] know the year so you probably want your chat bot to get back to the to the
[00:25:02] chat bot to get back to the to the student and say for which quarter would
[00:25:04] student and say for which quarter would you like to enroll right and the student
[00:25:08] you like to enroll right and the student would hopefully say winter 2019 or
[00:25:10] would hopefully say winter 2019 or winter and then you have to ask for the
[00:25:12] winter and then you have to ask for the year 2019 and finally you can say for
[00:25:15] year 2019 and finally you can say for sure I just enrolled you so we're not
[00:25:18] sure I just enrolled you so we're not making any assumption here on natural
[00:25:19] making any assumption here on natural language generation you've worked on a
[00:25:21] language generation you've worked on a Shakespeare assignment where you
[00:25:23] Shakespeare assignment where you generate Shakespeare like sentences in
[00:25:25] generate Shakespeare like sentences in fact a good shot boat would have this
[00:25:28] fact a good shot boat would have this feature of generating language but for
[00:25:30] feature of generating language but for our purpose which can just hard code
[00:25:32] our purpose which can just hard code that when you're able to enroll the
[00:25:33] that when you're able to enroll the students you just say I just enrolled
[00:25:35] students you just say I just enrolled you when you were able to retrieve
[00:25:37] you when you were able to retrieve information from the students you would
[00:25:38] information from the students you would just write here is some information and
[00:25:40] just write here is some information and you would plug in whatever the explore
[00:25:42] you would plug in whatever the explore course is API sent back in a JSON okay
[00:25:46] course is API sent back in a JSON okay so here the idea is this student
[00:25:49] so here the idea is this student utterance cannot be understood without
[00:25:52] utterance cannot be understood without context there is no way to understand
[00:25:54] context there is no way to understand winter 2019 if you don't have a context
[00:25:58] winter 2019 if you don't have a context management system does it make sense so
[00:26:02] management system does it make sense so we want to build that context management
[00:26:03] we want to build that context management system and then the question is how to
[00:26:08] system and then the question is how to handle context so there is a there's
[00:26:10] handle context so there is a there's many there are many ways to do that and
[00:26:12] many there are many ways to do that and people are still searching for the best
[00:26:13] people are still searching for the best ways one way is to handle it with
[00:26:16] ways one way is to handle it with reinforcement learning as you mentioned
[00:26:17] reinforcement learning as you mentioned earlier another way which is quite
[00:26:20] earlier another way which is quite intuitive and and closer to what we've
[00:26:22] intuitive and and closer to what we've seen together in sequence model in the
[00:26:25] seen together in sequence model in the module in the module five is this type
[00:26:28] module in the module five is this type of architectures which is which is taken
[00:26:31] of architectures which is which is taken from Chen a tall and twin memory
[00:26:33] from Chen a tall and twin memory networks with knowledge carryover for
[00:26:35] networks with knowledge carryover for multi turn spoken language understanding
[00:26:37] multi turn spoken language understanding so now you're able to understand what
[00:26:39] so now you're able to understand what multi-turn means and end-to-end memory
[00:26:41] multi-turn means and end-to-end memory network so what happens here just to
[00:26:42] network so what happens here just to describe it is we will save all the
[00:26:46] describe it is we will save all the history occurrences it means from the
[00:26:48] history occurrences it means from the beginning of the conversation we will
[00:26:49] beginning of the conversation we will record all the utterances and messages
[00:26:52] record all the utterances and messages exchanged between the user and the
[00:26:55] exchanged between the user and the assistant we will keep it in a storage
[00:26:58] assistant we will keep it in a storage that will be call history utterances see
[00:27:01] that will be call history utterances see is the current accounts so let's say the
[00:27:04] is the current accounts so let's say the student says winter 2019 this is the
[00:27:07] student says winter 2019 this is the utterance of the student at this point
[00:27:09] utterance of the student at this point we will run this see and of course like
[00:27:14] we will run this see and of course like it's its
[00:27:15] it's its these entrants will be run into an RNN
[00:27:17] these entrants will be run into an RNN and we will get back to an encoding of
[00:27:20] and we will get back to an encoding of this sentence so there's all the like
[00:27:22] this sentence so there's all the like word embedding stuff that I don't
[00:27:24] word embedding stuff that I don't describe but your guys are used to it so
[00:27:26] describe but your guys are used to it so we use word embeddings we run it - we
[00:27:29] we use word embeddings we run it - we run it to an RNN and we get back the
[00:27:31] run it to an RNN and we get back the encoding of the user utterance and this
[00:27:34] encoding of the user utterance and this encoding will then be compared to what
[00:27:36] encoding will then be compared to what we have in memory so all the user
[00:27:38] we have in memory so all the user utterances that we had in memory are
[00:27:40] utterances that we had in memory are also going to be run in an RNN that will
[00:27:42] also going to be run in an RNN that will encode their information in vectors
[00:27:45] encode their information in vectors these vectors are going to be put in a
[00:27:49] these vectors are going to be put in a memory representation and are you will
[00:27:55] memory representation and are you will be directly inner product we will have
[00:27:57] be directly inner product we will have an inner product from ru with all the
[00:27:59] an inner product from ru with all the memories and this pooled into a soft Max
[00:28:02] memories and this pooled into a soft Max will give us a vector of attention that
[00:28:05] will give us a vector of attention that you guys should be used to now a
[00:28:07] you guys should be used to now a knowledge attention distribution telling
[00:28:10] knowledge attention distribution telling us what's the relation where should we
[00:28:12] us what's the relation where should we put our attention in the memory for this
[00:28:15] put our attention in the memory for this specific utterance that make sense so
[00:28:20] specific utterance that make sense so simple inner product soft max gives us a
[00:28:23] simple inner product soft max gives us a series of weights here ok then we get
[00:28:29] series of weights here ok then we get awaited sum of all these attention
[00:28:31] awaited sum of all these attention weights multiplied by the memory and it
[00:28:34] weights multiplied by the memory and it gives us a vector that encodes the
[00:28:35] gives us a vector that encodes the relevance of the memory regarding our
[00:28:38] relevance of the memory regarding our current utterance this is then summed
[00:28:42] current utterance this is then summed and run into a simple matrix
[00:28:46] and run into a simple matrix multiplication to get an output vector
[00:28:49] multiplication to get an output vector which would be run in a slot stagnant
[00:28:50] which would be run in a slot stagnant sequence and usually it's experimental
[00:28:52] sequence and usually it's experimental but they pass also the current utterance
[00:28:56] but they pass also the current utterance to the RN and tiger and the orion tiger
[00:28:58] to the RN and tiger and the orion tiger comes up with a slot tagging
[00:29:00] comes up with a slot tagging so using that you can understand that
[00:29:02] so using that you can understand that winter 2019 is actually the target for
[00:29:05] winter 2019 is actually the target for the slots quarter and here because you
[00:29:09] the slots quarter and here because you have this memory network does it make
[00:29:12] have this memory network does it make sense so this is another type of
[00:29:17] sense so this is another type of attention models you want to use and
[00:29:18] attention models you want to use and this memory network sim can be used to
[00:29:20] this memory network sim can be used to manage some contexts for the slots
[00:29:22] manage some contexts for the slots tagger okay
[00:29:27] tagger okay so just to recap we have our example hi
[00:29:30] so just to recap we have our example hi I want to enroll in a class and we
[00:29:31] I want to enroll in a class and we detect the intents which is enrolled we
[00:29:35] detect the intents which is enrolled we also detect that there are some slots
[00:29:36] also detect that there are some slots missing because we know we know that the
[00:29:39] missing because we know we know that the enroll function needs the court earlier
[00:29:41] enroll function needs the court earlier and the class in order to be able to be
[00:29:44] and the class in order to be able to be called so we have to ask for those so we
[00:29:47] called so we have to ask for those so we probably hard-coded the fact that if you
[00:29:49] probably hard-coded the fact that if you don't have the quarter the Year and the
[00:29:51] don't have the quarter the Year and the class you probably want to first ask for
[00:29:53] class you probably want to first ask for the class or the quarter or the year
[00:29:56] the class or the quarter or the year then you can you can get back to the
[00:29:59] then you can you can get back to the person by asking which class you want to
[00:30:00] person by asking which class you want to enroll in the person would get back to
[00:30:03] enroll in the person would get back to you you will use your memory network to
[00:30:05] you you will use your memory network to understand that CS 230 is a slots for
[00:30:09] understand that CS 230 is a slots for the enroll in tenth you would fill it in
[00:30:13] the enroll in tenth you would fill it in so now we have our intent with the class
[00:30:14] so now we have our intent with the class equals CS 230 and we have our slots
[00:30:17] equals CS 230 and we have our slots quarter in year which are to be filled
[00:30:19] quarter in year which are to be filled the chat BOTS get bags for which quarter
[00:30:22] the chat BOTS get bags for which quarter and hopefully the student gives you the
[00:30:23] and hopefully the student gives you the year at the same time and you can fill
[00:30:25] year at the same time and you can fill in the slots and then you are enrolled
[00:30:29] in the slots and then you are enrolled in CS 234 winter 2019 yeah should be
[00:30:35] in CS 234 winter 2019 yeah should be spring yeah this shot boy is not trained
[00:30:38] spring yeah this shot boy is not trained very well okay any questions on that so
[00:30:44] very well okay any questions on that so this is a very simple case of a
[00:30:47] this is a very simple case of a conversational assistant just to give
[00:30:49] conversational assistant just to give you some ideas there are some paper
[00:30:50] you some ideas there are some paper listed in the presentation that you can
[00:30:52] listed in the presentation that you can go to in order to get more advanced
[00:30:55] go to in order to get more advanced research insights but the idea here is
[00:31:00] research insights but the idea here is that we're limited to a specific intent
[00:31:02] that we're limited to a specific intent to two specific intents and a few slots
[00:31:04] to two specific intents and a few slots what do you think we would need if we
[00:31:07] what do you think we would need if we didn't restrict ourselves to specific
[00:31:09] didn't restrict ourselves to specific intents and slots
[00:31:18] [Applause]
[00:31:24] it's a very complicated tough one
[00:31:29] it's a very complicated tough one industrial way to do it is to use a
[00:31:31] industrial way to do it is to use a knowledge graph what it means is let's
[00:31:35] knowledge graph what it means is let's say you're an e-commerce platform you
[00:31:37] say you're an e-commerce platform you probably have from your platform a
[00:31:40] probably have from your platform a knowledge graph oil of all the items on
[00:31:42] knowledge graph oil of all the items on the platform with connections among them
[00:31:45] the platform with connections among them like let's say color off let's say you
[00:31:48] like let's say color off let's say you have a shoe a shoe is a slot that might
[00:31:52] have a shoe a shoe is a slot that might be the object for the intents I want to
[00:31:55] be the object for the intents I want to buy something right the shoe can have
[00:31:58] buy something right the shoe can have several attributes like color or size or
[00:32:01] several attributes like color or size or men or women like gender and all these
[00:32:04] men or women like gender and all these are connected together in a gem in in in
[00:32:07] are connected together in a gem in in in a gigantic knowledge graph and you will
[00:32:10] a gigantic knowledge graph and you will follow the path of this knowledge graph
[00:32:12] follow the path of this knowledge graph following some probabilistic
[00:32:14] following some probabilistic probabilities so when we detect the
[00:32:17] probabilities so when we detect the intent of the user which is by something
[00:32:20] intent of the user which is by something we could identify the object I want to
[00:32:24] we could identify the object I want to buy a shoe and then based on our
[00:32:26] buy a shoe and then based on our knowledge graph it says that the next
[00:32:27] knowledge graph it says that the next question that we should ask or the next
[00:32:29] question that we should ask or the next slots that we need to feel is which
[00:32:32] slots that we need to feel is which brand do you want your shoe to be and so
[00:32:35] brand do you want your shoe to be and so the knowledge graph is going to tell you
[00:32:36] the knowledge graph is going to tell you with 60% probability go to brand and ask
[00:32:40] with 60% probability go to brand and ask about the brand
[00:32:41] about the brand once you're there what other information
[00:32:43] once you're there what other information you need in order to be able to retrieve
[00:32:45] you need in order to be able to retrieve five results for the user to review and
[00:32:49] five results for the user to review and so on so the knowledge graph is
[00:32:51] so on so the knowledge graph is something in this field that can be used
[00:32:52] something in this field that can be used in order to have multiple intense
[00:32:55] in order to have multiple intense multiple slots for every intent okay and
[00:33:00] multiple slots for every intent okay and at the end we can make an API call here
[00:33:02] at the end we can make an API call here with CS 2:30 quarter winter 2019 quarter
[00:33:05] with CS 2:30 quarter winter 2019 quarter winter year 2019 and the Sui D okay
[00:33:11] winter year 2019 and the Sui D okay another question I had for you I've had
[00:33:15] another question I had for you I've had for you I have for you is how to
[00:33:17] for you I have for you is how to evaluate the performance of a chat bot
[00:33:20] evaluate the performance of a chat bot what do you think of that
[00:33:33] so there are common ways to to evaluate
[00:33:36] so there are common ways to to evaluate several part of your pipeline like how
[00:33:39] several part of your pipeline like how is your slot Tiger doing how is your
[00:33:42] is your slot Tiger doing how is your intent classifier do you can use metrics
[00:33:44] intent classifier do you can use metrics such as precision and recall f1 score
[00:33:48] such as precision and recall f1 score for the mix of both and report those in
[00:33:51] for the mix of both and report those in order to compare how this module is
[00:33:53] order to compare how this module is doing for the chat bot but ultimately
[00:33:57] doing for the chat bot but ultimately you want to understand how good is your
[00:33:59] you want to understand how good is your chat bot overall so some experiments are
[00:34:02] chat bot overall so some experiments are done and this is a paper of a deep
[00:34:04] done and this is a paper of a deep reinforcement learning chat bot built in
[00:34:06] reinforcement learning chat bot built in 2017 by the millah serve an adult and
[00:34:10] 2017 by the millah serve an adult and what they did is that they used
[00:34:12] what they did is that they used Mechanical Turk in order to evaluate
[00:34:14] Mechanical Turk in order to evaluate their chat BOTS and also build a scoring
[00:34:16] their chat BOTS and also build a scoring system for their reinforcement learning
[00:34:18] system for their reinforcement learning chat bot so I'm reading for you the
[00:34:20] chat bot so I'm reading for you the instructions you will be presented with
[00:34:22] instructions you will be presented with a conversation between two speakers
[00:34:24] a conversation between two speakers speaker a and B you will also be
[00:34:26] speaker a and B you will also be presented with four potential responses
[00:34:28] presented with four potential responses from one of the speakers for this
[00:34:30] from one of the speakers for this dialogue and the task is for you to rate
[00:34:32] dialogue and the task is for you to rate each of the responses between one
[00:34:36] each of the responses between one inappropriate doesn't make sense
[00:34:38] inappropriate doesn't make sense to five highly appropriate and
[00:34:40] to five highly appropriate and interesting based on how appropriate the
[00:34:42] interesting based on how appropriate the response is to continue the conversation
[00:34:44] response is to continue the conversation three is neutral and if two responses
[00:34:49] three is neutral and if two responses are equally appropriate you should give
[00:34:51] are equally appropriate you should give them the same score and if you see
[00:34:53] them the same score and if you see response that is not in English please
[00:34:55] response that is not in English please give a one score so here is what happens
[00:34:58] give a one score so here is what happens from a user perspective you would have a
[00:35:01] from a user perspective you would have a conversation you need to work on your
[00:35:03] conversation you need to work on your English why do you say that about me
[00:35:07] English why do you say that about me well your English is very poor so this
[00:35:11] well your English is very poor so this is the conversation and then the
[00:35:13] is the conversation and then the response one is but English is my native
[00:35:15] response one is but English is my native language response to is what other
[00:35:18] language response to is what other reasons come to mind response three is
[00:35:20] reasons come to mind response three is here is a funny fact go is the shortest
[00:35:26] here is a funny fact go is the shortest complete sentence in the English
[00:35:27] complete sentence in the English language and then the fourth response is
[00:35:30] language and then the fourth response is by doggy so obviously you have to you
[00:35:35] by doggy so obviously you have to you have to score you have to score these
[00:35:38] have to score you have to score these these responses according to what you
[00:35:41] these responses according to what you think how relevant they are and then
[00:35:44] think how relevant they are and then and then these scores will be used
[00:35:47] and then these scores will be used either for the scoring system of the
[00:35:49] either for the scoring system of the deep reinforcement learning chat bot or
[00:35:51] deep reinforcement learning chat bot or it can be used to evaluate how good is
[00:35:53] it can be used to evaluate how good is your chat bot compare to other channels
[00:35:54] your chat bot compare to other channels so maybe each of these responds come
[00:35:56] so maybe each of these responds come from a different model does that make
[00:36:00] from a different model does that make sense so these are a few ways there
[00:36:06] sense so these are a few ways there another way which is asking for the
[00:36:08] another way which is asking for the opinion of the user on different
[00:36:11] opinion of the user on different responses so let's say you you're a user
[00:36:14] responses so let's say you you're a user and you are you are comparing to chat
[00:36:19] and you are you are comparing to chat BOTS you can give your opinion on which
[00:36:21] BOTS you can give your opinion on which one you think is more natural and you
[00:36:23] one you think is more natural and you would ask a lot of users to do that to
[00:36:25] would ask a lot of users to do that to compare two or three chat BOTS together
[00:36:27] compare two or three chat BOTS together and also compare them to natural
[00:36:29] and also compare them to natural language from a human and then by doing
[00:36:32] language from a human and then by doing a lot of mean opinion score experiments
[00:36:36] a lot of mean opinion score experiments you can evaluate which chat bots are
[00:36:38] you can evaluate which chat bots are better than the others just comparing
[00:36:40] better than the others just comparing them one-on-one okay now getting back to
[00:36:46] them one-on-one okay now getting back to one thing that the student mentioned
[00:36:49] one thing that the student mentioned earlier is what if we want to have a
[00:36:50] earlier is what if we want to have a vocal assistant so right now our
[00:36:52] vocal assistant so right now our assistant is not vocal it's just text
[00:36:54] assistant is not vocal it's just text what other things do we need to build in
[00:36:57] what other things do we need to build in order to make it a vocal assistant we're
[00:37:04] order to make it a vocal assistant we're not going to go into in the details but
[00:37:06] not going to go into in the details but roughly you would need a speech-to-text
[00:37:09] roughly you would need a speech-to-text system which will take the voice of a
[00:37:13] system which will take the voice of a user convert it into a text and this as
[00:37:15] user convert it into a text and this as you've seen in the sequence model class
[00:37:17] you've seen in the sequence model class has different step in the pipeline and
[00:37:21] has different step in the pipeline and the speech to text so any text to speech
[00:37:23] the speech to text so any text to speech that takes the text from the chat pod
[00:37:25] that takes the text from the chat pod and convert it into a voice so that's
[00:37:28] and convert it into a voice so that's how you have like virtual assistants
[00:37:30] how you have like virtual assistants talking to us is because they have a
[00:37:32] talking to us is because they have a text-to-speech system running and these
[00:37:34] text-to-speech system running and these are three papers the first one is this
[00:37:36] are three papers the first one is this speech - from values team which built an
[00:37:40] speech - from values team which built an end-to-end speech recognition in English
[00:37:41] end-to-end speech recognition in English and Mandarin and the two others are text
[00:37:44] and Mandarin and the two others are text to speech synthesis so one came up in
[00:37:47] to speech synthesis so one came up in February 2018 which is the tacit Ron -
[00:37:50] February 2018 which is the tacit Ron - and the second one is wavenet which is a
[00:37:52] and the second one is wavenet which is a very popular generative models and these
[00:37:55] very popular generative models and these are these are far beyond the scope
[00:37:57] are these are far beyond the scope of the class but you can study them in
[00:38:01] of the class but you can study them in other classes at Stanford which are more
[00:38:03] other classes at Stanford which are more specific to speech okay class project
[00:38:07] specific to speech okay class project advice so this Friday we're going to go
[00:38:09] advice so this Friday we're going to go over again the rubrics of what we look
[00:38:14] over again the rubrics of what we look at when we when we great projects and
[00:38:16] at when we when we great projects and here is the list of things we would look
[00:38:18] here is the list of things we would look at so make sure you have a very good
[00:38:22] at so make sure you have a very good problem description when you read papers
[00:38:23] problem description when you read papers you see that there is a very good
[00:38:25] you see that there is a very good abstract we expect you to give us a very
[00:38:26] abstract we expect you to give us a very good abstract so that when we read it we
[00:38:28] good abstract so that when we read it we get a good understanding of the paper
[00:38:30] get a good understanding of the paper hyper parameter tuning always report
[00:38:33] hyper parameter tuning always report what you do you don't need to to be very
[00:38:35] what you do you don't need to to be very exhaustive but but you can just tell us
[00:38:38] exhaustive but but you can just tell us what hyper parameters you've been
[00:38:40] what hyper parameters you've been choosing and which ones you've been
[00:38:41] choosing and which ones you've been testing and why they didn't work the
[00:38:45] testing and why they didn't work the right thing we look for typos this is
[00:38:47] right thing we look for typos this is common in the grading scheme typos a
[00:38:50] common in the grading scheme typos a clear language so review it peer review
[00:38:54] clear language so review it peer review your paper explanation of choice in this
[00:38:57] your paper explanation of choice in this unit this is a very important part we
[00:38:59] unit this is a very important part we expect you to explain the decisions
[00:39:01] expect you to explain the decisions you're making so we don't want you to to
[00:39:04] you're making so we don't want you to to tell us I've taken I've made that
[00:39:06] tell us I've taken I've made that decision just without explaining but
[00:39:09] decision just without explaining but rather tell us there is this paper that
[00:39:10] rather tell us there is this paper that mentioned that this architecture worked
[00:39:13] mentioned that this architecture worked well on that specific task I've tried
[00:39:16] well on that specific task I've tried three architectures here are my hyper
[00:39:18] three architectures here are my hyper parameters and results that's why I'm
[00:39:20] parameters and results that's why I'm gonna I'm going to dig more into that
[00:39:22] gonna I'm going to dig more into that one and so on data cleaning and
[00:39:24] one and so on data cleaning and pre-processing if applicable to your
[00:39:26] pre-processing if applicable to your project explain it how much code you
[00:39:29] project explain it how much code you wrote on your own it's important to us
[00:39:31] wrote on your own it's important to us and please submit your github or
[00:39:33] and please submit your github or privately to the TAS when you submit
[00:39:36] privately to the TAS when you submit your projects gonna make it easier for
[00:39:38] your projects gonna make it easier for us to review the code insights and
[00:39:41] us to review the code insights and discussions include the next steps what
[00:39:44] discussions include the next steps what would you have done if you had more time
[00:39:46] would you have done if you had more time and also interpret your results don't
[00:39:48] and also interpret your results don't just give results without explanation
[00:39:51] just give results without explanation but rather try to extract information
[00:39:53] but rather try to extract information from these results and you can also
[00:39:56] from these results and you can also drive your next steps explanation
[00:39:59] drive your next steps explanation results are important but if you don't
[00:40:02] results are important but if you don't have the results you expected it's fine
[00:40:04] have the results you expected it's fine we will look at how much work you've
[00:40:05] we will look at how much work you've done and some tasks are very complicated
[00:40:07] done and some tasks are very complicated we don't expect you to beat
[00:40:09] we don't expect you to beat state-of-the-art on every single
[00:40:10] state-of-the-art on every single as some of you are going to be
[00:40:12] as some of you are going to be state-of-the-art hopefully but those of
[00:40:15] state-of-the-art hopefully but those of you who didn't still report all your
[00:40:17] you who didn't still report all your results and explained why it didn't work
[00:40:19] results and explained why it didn't work give references and also penalty for
[00:40:23] give references and also penalty for more than five pages so if you're
[00:40:24] more than five pages so if you're working on a on a theoretical project
[00:40:28] working on a on a theoretical project you can add additional pages as appendix
[00:40:31] you can add additional pages as appendix you can also add appendix for your
[00:40:33] you can also add appendix for your project but the core has to be five
[00:40:36] project but the core has to be five pages and for the final poster
[00:40:39] pages and for the final poster presentation which will happen not this
[00:40:41] presentation which will happen not this Friday next one we will ask you to pitch
[00:40:44] Friday next one we will ask you to pitch your project in three minutes
[00:40:46] your project in three minutes so not everyone in the group has to talk
[00:40:48] so not everyone in the group has to talk but at least one person has to talk in
[00:40:50] but at least one person has to talk in and we prefer if several of you talk in
[00:40:53] and we prefer if several of you talk in the project but you have three minutes
[00:40:55] the project but you have three minutes to pitch your project so prepare the
[00:40:56] to pitch your project so prepare the pitch in advance and you will have two
[00:40:59] pitch in advance and you will have two minutes of questions from the TA which
[00:41:01] minutes of questions from the TA which are also part of the grade okay finally
[00:41:06] are also part of the grade okay finally what's next after CS 2:30 so there's a
[00:41:08] what's next after CS 2:30 so there's a ton of class at Stanford we're in a good
[00:41:10] ton of class at Stanford we're in a good learning environment which is just super
[00:41:13] learning environment which is just super next steps can be in the university
[00:41:17] next steps can be in the university classes you can take in natural language
[00:41:18] classes you can take in natural language processing and computer vision but also
[00:41:22] processing and computer vision but also classes from different departments deep
[00:41:26] classes from different departments deep generative models is a good way to learn
[00:41:28] generative models is a good way to learn about text to speech for example or gans
[00:41:32] about text to speech for example or gans probably see graphical models is also
[00:41:34] probably see graphical models is also very important class in the industry s
[00:41:36] very important class in the industry s Department of course if you haven't
[00:41:38] Department of course if you haven't taken it yet CS two to nine machine
[00:41:40] taken it yet CS two to nine machine learning or CS two to nine a applied
[00:41:42] learning or CS two to nine a applied machine learning or to go to to learn
[00:41:45] machine learning or to go to to learn machine learning reinforcement learning
[00:41:47] machine learning reinforcement learning is a class where you can you can delve
[00:41:49] is a class where you can you can delve more into Q learning policy gradients
[00:41:52] more into Q learning policy gradients and all these methods that sometime use
[00:41:55] and all these methods that sometime use deep learning so we're going to publish
[00:41:58] deep learning so we're going to publish that list in case you want to check it
[00:42:00] that list in case you want to check it but these are examples of classes you
[00:42:02] but these are examples of classes you can take and of course there are other
[00:42:03] can take and of course there are other classes that tournament not mention here
[00:42:05] classes that tournament not mention here that might be relevant to pursue your
[00:42:08] that might be relevant to pursue your learning in in deep deep learning in
[00:42:10] learning in in deep deep learning in machine learning
[00:42:12] machine learning okay that said I'm going to to give the
[00:42:15] okay that said I'm going to to give the microphone to Andrew for closing remarks
[00:42:17] microphone to Andrew for closing remarks and
[00:42:19] and yeah good luck on your projects so we'll
[00:42:23] yeah good luck on your projects so we'll see you on Friday for the discussion
[00:42:25] see you on Friday for the discussion sections and next week for the final
[00:42:27] sections and next week for the final project charlie microphone oh so all
[00:42:39] project charlie microphone oh so all right here we are at the end of this
[00:42:41] right here we are at the end of this class nearly at the end of this class um
[00:42:45] class nearly at the end of this class um you know dear new Rip's conference is
[00:42:49] you know dear new Rip's conference is taking place right now formerly the nips
[00:42:51] taking place right now formerly the nips conference of a renamed to new ribs and
[00:42:53] conference of a renamed to new ribs and I remember it was ten years ago that at
[00:42:57] I remember it was ten years ago that at that time a piece tune in the diet
[00:42:58] that time a piece tune in the diet bridge a trainer presents the paper
[00:43:01] bridge a trainer presents the paper workshop paper at nips telling people
[00:43:03] workshop paper at nips telling people hey consider using GPUs and crew there
[00:43:06] hey consider using GPUs and crew there which is a new thing that Nvidia I just
[00:43:08] which is a new thing that Nvidia I just published to train your networks and
[00:43:10] published to train your networks and we've done that work on a GPU server
[00:43:14] we've done that work on a GPU server that Ian could fellow the creator of
[00:43:16] that Ian could fellow the creator of Gans have built in his dorm room when he
[00:43:19] Gans have built in his dorm room when he was an undergrad at Stanford so our
[00:43:21] was an undergrad at Stanford so our first few server was built in the
[00:43:22] first few server was built in the Stanford undergrads dorm and I remember
[00:43:28] Stanford undergrads dorm and I remember sitting down with Jeff Fenton and food
[00:43:30] sitting down with Jeff Fenton and food called and saying hey check out the
[00:43:31] called and saying hey check out the screw the thing and Jeff said no but GPU
[00:43:34] screw the thing and Jeff said no but GPU program is really hard but then but then
[00:43:35] program is really hard but then but then but but oh maybe this crew the thing
[00:43:37] but but oh maybe this crew the thing looks promising and I tell the story
[00:43:41] looks promising and I tell the story because I want you to know as Stanford
[00:43:44] because I want you to know as Stanford students that your work can matter right
[00:43:47] students that your work can matter right when younger fellow built that GPU
[00:43:50] when younger fellow built that GPU server in his dorm room I had no idea if
[00:43:54] server in his dorm room I had no idea if he realized that a decade later you know
[00:43:56] he realized that a decade later you know someone would be winning several hundred
[00:43:58] someone would be winning several hundred hours of AWS credits to try and bigger
[00:44:01] hours of AWS credits to try and bigger deep learning algorithms but I think as
[00:44:05] deep learning algorithms but I think as Stanford here at Stanford University
[00:44:07] Stanford here at Stanford University were very much at the heart of the
[00:44:10] were very much at the heart of the technology world
[00:44:11] technology world I think Silicon Valley is here to a
[00:44:13] I think Silicon Valley is here to a large pot because Stanford University is
[00:44:16] large pot because Stanford University is here and we live in a world where with
[00:44:20] here and we live in a world where with the superpowers that you now have you
[00:44:23] the superpowers that you now have you have a lot of opportunities to do new
[00:44:25] have a lot of opportunities to do new and exciting work which may or may not
[00:44:27] and exciting work which may or may not seem like your mats in the short run
[00:44:29] seem like your mats in the short run maybe even seem constant
[00:44:31] maybe even seem constant in the short run be concerti have a huge
[00:44:33] in the short run be concerti have a huge impact in the long run as a couple
[00:44:37] impact in the long run as a couple weekends ago so um my wife
[00:44:40] weekends ago so um my wife we roast coffee beans at home right my
[00:44:42] we roast coffee beans at home right my wife buys raw coffee beans and then we
[00:44:44] wife buys raw coffee beans and then we actually roast them and camera or my
[00:44:46] actually roast them and camera or my wife tends to roast em and its really
[00:44:48] wife tends to roast em and its really cheap popcorn popper that we have right
[00:44:50] cheap popcorn popper that we have right now so I don't know I don't have much
[00:44:53] now so I don't know I don't have much coffee you guys drink I drink a lot of
[00:44:54] coffee you guys drink I drink a lot of coffee and so you know so Carol byesies
[00:44:57] coffee and so you know so Carol byesies being coffee bean see she puts them in
[00:44:59] being coffee bean see she puts them in this like cheap popcorn popper which is
[00:45:01] this like cheap popcorn popper which is made for popping popcorn not made for
[00:45:03] made for popping popcorn not made for rose and coffee beans this is one of the
[00:45:04] rose and coffee beans this is one of the standard cheap ways to roast coffee
[00:45:06] standard cheap ways to roast coffee beans and and I love my wife I drink the
[00:45:09] beans and and I love my wife I drink the coffee she makes but sometimes she burns
[00:45:10] coffee she makes but sometimes she burns the coffee beans so I found this article
[00:45:13] the coffee beans so I found this article on the internet from a former student
[00:45:16] on the internet from a former student that written an article and how they use
[00:45:19] that written an article and how they use machine learning to roast to optimize
[00:45:22] machine learning to roast to optimize the roasting of coffee beans as I
[00:45:24] the roasting of coffee beans as I forwarded to the Carol she wasn't very
[00:45:27] forwarded to the Carol she wasn't very happy about that and but I raised this
[00:45:31] happy about that and but I raised this is another example of how all of you you
[00:45:37] is another example of how all of you you know I would never have thought of
[00:45:38] know I would never have thought of applying machine learning to roasting
[00:45:39] applying machine learning to roasting coffee beans it's just I mean you know I
[00:45:42] coffee beans it's just I mean you know I like my coffee but it had never occurred
[00:45:44] like my coffee but it had never occurred to me to do that but someone taking a
[00:45:47] to me to do that but someone taking a machine learning class like you guys are
[00:45:50] machine learning class like you guys are go ahead and come up with a better way
[00:45:52] go ahead and come up with a better way of roasting coffee beans using learning
[00:45:55] of roasting coffee beans using learning algorithms and again I think you I don't
[00:45:57] algorithms and again I think you I don't know this picker person that wrote this
[00:45:59] know this picker person that wrote this blog post was thinking building a
[00:46:00] blog post was thinking building a business all of it I I don't know there
[00:46:02] business all of it I I don't know there might be a business that they might not
[00:46:03] might be a business that they might not or it might be just a fun personal hope
[00:46:04] or it might be just a fun personal hope you actually don't know but all of you
[00:46:07] you actually don't know but all of you with these skills have that opportunity
[00:46:09] with these skills have that opportunity and then again earlier this week was it
[00:46:14] and then again earlier this week was it Monday night a group of us we were
[00:46:18] Monday night a group of us we were actually in the gates building where a
[00:46:21] actually in the gates building where a bunch of students actually room the yeah
[00:46:23] bunch of students actually room the yeah for health care boot camp that can
[00:46:25] for health care boot camp that can alluded to
[00:46:26] alluded to yeah we're going over some to final
[00:46:28] yeah we're going over some to final projects for the students and they you
[00:46:30] projects for the students and they you have to healthcare boot camp where we're
[00:46:33] have to healthcare boot camp where we're working on and I think and I think
[00:46:34] working on and I think and I think actually met several people including
[00:46:36] actually met several people including Artie right when she first participates
[00:46:38] Artie right when she first participates in a much earlier version of that
[00:46:40] in a much earlier version of that okay bootcamp secret you can also RT of
[00:46:42] okay bootcamp secret you can also RT of others what you interested but they're
[00:46:44] others what you interested but they're um one of the masses students I was
[00:46:47] um one of the masses students I was working with patients in primary record
[00:46:49] working with patients in primary record I think you guys been in this cause he
[00:46:51] I think you guys been in this cause he was demoing an app where you could pull
[00:46:55] was demoing an app where you could pull up an x-ray film and take a picture with
[00:46:58] up an x-ray film and take a picture with your cell phone upload the picture to a
[00:47:02] your cell phone upload the picture to a website and have a website you know read
[00:47:06] website and have a website you know read the x-ray and suggest the diagnosis for
[00:47:09] the x-ray and suggest the diagnosis for our patients most of planning today has
[00:47:12] our patients most of planning today has insufficient access to radiology
[00:47:14] insufficient access to radiology services there are many countries where
[00:47:16] services there are many countries where it costs you three months of salary to
[00:47:20] it costs you three months of salary to go and get an x-ray taken and then maybe
[00:47:22] go and get an x-ray taken and then maybe try to find the radiologist to read it
[00:47:24] try to find the radiologist to read it but most the planet billions of people
[00:47:27] but most the planet billions of people on this planet do not have sufficient
[00:47:29] on this planet do not have sufficient services and radiology services and
[00:47:32] services and radiology services and while the standards in AI for healthcare
[00:47:35] while the standards in AI for healthcare bootcamp is still a research project
[00:47:36] bootcamp is still a research project actually you record on the checks net
[00:47:39] actually you record on the checks net paper won't you answer yeah right yes
[00:47:40] paper won't you answer yeah right yes I'll do a shared co-author on all these
[00:47:42] I'll do a shared co-author on all these papers it is a game maybe work done here
[00:47:45] papers it is a game maybe work done here at Stanford that you know is taking the
[00:47:48] at Stanford that you know is taking the first steps to what maybe if we can
[00:47:52] first steps to what maybe if we can improve the deep learning algorithms
[00:47:53] improve the deep learning algorithms posterity hurdles you know proof safety
[00:47:57] posterity hurdles you know proof safety maybe that type of work happening here
[00:48:01] maybe that type of work happening here at Stanford doing that for health care
[00:48:03] at Stanford doing that for health care maybe that would have a transformative
[00:48:05] maybe that would have a transformative effect on how healthcare is run around
[00:48:08] effect on how healthcare is run around the world so um
[00:48:11] the world so um the skills that you guys now have are
[00:48:16] the skills that you guys now have are very unique set of skills they're not
[00:48:18] very unique set of skills they're not that many people on the planet
[00:48:19] that many people on the planet today that can apply learning algorithms
[00:48:22] today that can apply learning algorithms and deep learning arms the way that you
[00:48:24] and deep learning arms the way that you can and you can tell a lot idea as you
[00:48:27] can and you can tell a lot idea as you learned in this class where you know
[00:48:28] learned in this class where you know invented in the last year or two so this
[00:48:31] invented in the last year or two so this is just not yet been time for these
[00:48:32] is just not yet been time for these ideas even become widespread and if I
[00:48:35] ideas even become widespread and if I look at a lot of the most pressing
[00:48:36] look at a lot of the most pressing problems facing society be a lack of
[00:48:39] problems facing society be a lack of access to health care or um science I
[00:48:42] access to health care or um science I spent a lot of times think about climate
[00:48:43] spent a lot of times think about climate change and I think if you look at the
[00:48:47] change and I think if you look at the the can we improve access to education
[00:48:49] the can we improve access to education can we just make whole society run more
[00:48:52] can we just make whole society run more efficiently
[00:48:53] efficiently I think that all of you have the skills
[00:48:56] I think that all of you have the skills to do very unique projects and I hope
[00:48:59] to do very unique projects and I hope that as you graduate from this class I'm
[00:49:01] that as you graduate from this class I'm sure some of you will great businesses
[00:49:03] sure some of you will great businesses may make all the money that's great and
[00:49:04] may make all the money that's great and and I hope that all of you will also
[00:49:06] and I hope that all of you will also take the unique skills you have to work
[00:49:09] take the unique skills you have to work on projects that matter the most to
[00:49:11] on projects that matter the most to other people that that help other people
[00:49:14] other people that that help other people because if one of you does not take your
[00:49:18] because if one of you does not take your skills to do something meaningful then
[00:49:20] skills to do something meaningful then there's probably some very meaningful
[00:49:21] there's probably some very meaningful project that just no one is working on
[00:49:23] project that just no one is working on because I think the number of meaningful
[00:49:25] because I think the number of meaningful projects I think actually greatly
[00:49:28] projects I think actually greatly exceeds the number of people in the
[00:49:29] exceeds the number of people in the world today that are skilled at deep
[00:49:30] world today that are skilled at deep learning which is why all of you have a
[00:49:33] learning which is why all of you have a unique opportunity to take these
[00:49:35] unique opportunity to take these algorithms that you now know about to
[00:49:37] algorithms that you now know about to apply to anything from developing novel
[00:49:40] apply to anything from developing novel chatbots to improving healthcare - I
[00:49:44] chatbots to improving healthcare - I guess my team at landing a is improving
[00:49:46] guess my team at landing a is improving manufacturing agriculture also some
[00:49:48] manufacturing agriculture also some healthcare to maybe helping with climate
[00:49:51] healthcare to maybe helping with climate change to helping with global education
[00:49:54] change to helping with global education and any other problems that that really
[00:49:58] and any other problems that that really matter so I hope I hope maybe I hope
[00:50:01] matter so I hope I hope maybe I hope that all of you go on to to do work that
[00:50:04] that all of you go on to to do work that matters and then one last story you know
[00:50:09] matters and then one last story you know a few a few months ago now um I got to
[00:50:13] a few a few months ago now um I got to drive a tractor right it was very big a
[00:50:15] drive a tractor right it was very big a little bit scary it feels like a bigger
[00:50:17] little bit scary it feels like a bigger machine then I should be qualified to
[00:50:19] machine then I should be qualified to drive it's a huge factor and and it
[00:50:23] drive it's a huge factor and and it turns out that when you drive a tractor
[00:50:24] turns out that when you drive a tractor so it turns out when you drive a normal
[00:50:26] so it turns out when you drive a normal car you know is really clear which way
[00:50:28] car you know is really clear which way is up on the steering wheel right here
[00:50:30] is up on the steering wheel right here you point the Siringo up and you know
[00:50:31] you point the Siringo up and you know your car drives forward for the tractor
[00:50:35] your car drives forward for the tractor that I got to drive this huge tractor it
[00:50:37] that I got to drive this huge tractor it turns out that dumb as this giant
[00:50:39] turns out that dumb as this giant steering wheel and to drive straight the
[00:50:42] steering wheel and to drive straight the giant steering wheel was just oriented
[00:50:43] giant steering wheel was just oriented at some weird angle and to turn right
[00:50:46] at some weird angle and to turn right you turn the clockwise to turn left you
[00:50:48] you turn the clockwise to turn left you turn anti-clockwise and that was that
[00:50:50] turn anti-clockwise and that was that right so there's a lot of fun and maybe
[00:50:53] right so there's a lot of fun and maybe in addition to and and it was just fun
[00:50:57] in addition to and and it was just fun you know I drove a tractor made a u-turn
[00:51:00] you know I drove a tractor made a u-turn drove back to where started did not hit
[00:51:02] drove back to where started did not hit anyone you know there's no accident and
[00:51:04] anyone you know there's no accident and they're like
[00:51:05] they're like down off this giant rafter and maybe I
[00:51:08] down off this giant rafter and maybe I tell that story because I hope that even
[00:51:11] tell that story because I hope that even while you are doing this important may
[00:51:16] while you are doing this important may be beneficial to other people sense of
[00:51:17] be beneficial to other people sense of work I hope I hope you also have fun I
[00:51:20] work I hope I hope you also have fun I think that I feel really privileged that
[00:51:22] think that I feel really privileged that is a machine learning engineer um I some
[00:51:25] is a machine learning engineer um I some days I get to go drive a tractor right
[00:51:28] days I get to go drive a tractor right and and I hope that and one of the most
[00:51:32] and and I hope that and one of the most exciting things you know I feel like um
[00:51:36] exciting things you know I feel like um a lot of the best a lot of biggest
[00:51:40] a lot of the best a lot of biggest untapped opportunities for AI like
[00:51:43] untapped opportunities for AI like outside the software industry I'm very
[00:51:45] outside the software industry I'm very proud of the work that helped you you
[00:51:46] proud of the work that helped you you know leaving the Google rain team being
[00:51:48] know leaving the Google rain team being AI do and I think more people should do
[00:51:50] AI do and I think more people should do that type of work and I think that here
[00:51:54] that type of work and I think that here in Silicon Valley many of you will get
[00:51:55] in Silicon Valley many of you will get jobs in the tech sector and that's great
[00:51:58] jobs in the tech sector and that's great we need more people to do that and I
[00:52:00] we need more people to do that and I also think that if you look at all of
[00:52:03] also think that if you look at all of human activity the majority of human
[00:52:05] human activity the majority of human activity is actually outside the
[00:52:06] activity is actually outside the software industry the majority of global
[00:52:08] software industry the majority of global GDP growth or global GDP is actually
[00:52:12] GDP growth or global GDP is actually outside the software industry and I
[00:52:14] outside the software industry and I would just urge you as you are
[00:52:15] would just urge you as you are considering what is the most meaningful
[00:52:17] considering what is the most meaningful work so consider the software industry
[00:52:19] work so consider the software industry but also look outside the software
[00:52:21] but also look outside the software industry because I think really the
[00:52:23] industry because I think really the biggest untapped opportunities for AI
[00:52:25] biggest untapped opportunities for AI lie outside
[00:52:26] lie outside I think lie outside the software
[00:52:29] I think lie outside the software industry and and we can't have everyone
[00:52:31] industry and and we can't have everyone doing the same thing right there's
[00:52:33] doing the same thing right there's actually not a healthy plan and if
[00:52:35] actually not a healthy plan and if everyone you know works on improved web
[00:52:37] everyone you know works on improved web search or improve or even improved
[00:52:40] search or improve or even improved healthcare I think we need a world where
[00:52:43] healthcare I think we need a world where all of you have these skills share these
[00:52:45] all of you have these skills share these skills teach other people why should
[00:52:46] skills teach other people why should learn and go out to do this work that
[00:52:49] learn and go out to do this work that hopefully affects the software industry
[00:52:51] hopefully affects the software industry affects other industries affects profit
[00:52:53] affects other industries affects profit nonprofit affects government but uses
[00:52:56] nonprofit affects government but uses these AI capabilities to lift up the
[00:52:58] these AI capabilities to lift up the whole human race and then finally the
[00:53:04] whole human race and then finally the last thing wants to say on behalf of Ken
[00:53:06] last thing wants to say on behalf of Ken and me and the whole teaching team is I
[00:53:08] and me and the whole teaching team is I wanted to thank you for your hard work
[00:53:10] wanted to thank you for your hard work on this cause I know that you know
[00:53:13] on this cause I know that you know watching the videos doing the homeworks
[00:53:16] watching the videos doing the homeworks on the website me to the tears
[00:53:18] on the website me to the tears section you know that many of you have
[00:53:22] section you know that many of you have put a lot of work in this cause and it
[00:53:24] put a lot of work in this cause and it wasn't so long ago I guess when I was a
[00:53:26] wasn't so long ago I guess when I was a student you know staying at home doing
[00:53:29] student you know staying at home doing this homework or trying to derive that
[00:53:30] this homework or trying to derive that math thing I'd also take some online
[00:53:33] math thing I'd also take some online courses myself so it's actually not so
[00:53:34] courses myself so it's actually not so long ago that you know I was sitting a
[00:53:37] long ago that you know I was sitting a computer much like you kind of watch
[00:53:39] computer much like you kind of watch some Coursera videos and then click on
[00:53:41] some Coursera videos and then click on this click on that and answer things
[00:53:42] this click on that and answer things online and and and III appreciate Ken
[00:53:46] online and and and III appreciate Ken and I and the whole teaching team
[00:53:47] and I and the whole teaching team appreciate all the hard work you put
[00:53:49] appreciate all the hard work you put into this and I hope also that you got a
[00:53:53] into this and I hope also that you got a lot out of your hard work and that you
[00:53:55] lot out of your hard work and that you will take these rare and unique skills
[00:53:58] will take these rare and unique skills you now have to go on and and when you
[00:54:00] you now have to go on and and when you drive from Stanford or further oh or for
[00:54:03] drive from Stanford or further oh or for the whole viewers I guess for those are
[00:54:05] the whole viewers I guess for those are home viewers as was for the in costume
[00:54:08] home viewers as was for the in costume viewers that you take these Rascals you
[00:54:09] viewers that you take these Rascals you now have them ain't going to do work
[00:54:11] now have them ain't going to do work that matters and go on to do working
[00:54:13] that matters and go on to do working cause other people so with that I look
[00:54:17] cause other people so with that I look forward to seeing all of your projects
[00:54:19] forward to seeing all of your projects at the poster session and apologize in
[00:54:22] at the poster session and apologize in advance we won't be the really get a
[00:54:24] advance we won't be the really get a deep understanding in three minutes we
[00:54:25] deep understanding in three minutes we don't worry we do read your project
[00:54:27] don't worry we do read your project reports but I look forward to seeing
[00:54:29] reports but I look forward to seeing hope you are looking forward also to
[00:54:32] hope you are looking forward also to seeing everyone else's work on the
[00:54:33] seeing everyone else's work on the poster session boom with that let me
[00:54:35] poster session boom with that let me just say on behalf of the enemy and the
[00:54:37] just say on behalf of the enemy and the whole teaching team thank you all very
[00:54:39] whole teaching team thank you all very much
[00:54:41] much [Applause]


================================================================================
LECTURE INDEX.md
================================================================================

CS230 – Deep Learning (Andrew Ng)

Playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOABXSygHTsbvUz4G_YQhOb

Total Videos: 10
Transcripts Downloaded: 10
Failed/No Captions: 0

---

Lectures

1. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 1 - Class Introduction & Logistics, Andrew Ng
- Video: [https://www.youtube.com/watch?v=PySo_6S4ZAg](https://www.youtube.com/watch?v=PySo_6S4ZAg)
- Transcript: [001_PySo_6S4ZAg.md](001_PySo_6S4ZAg.md)

2. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 2 - Deep Learning Intuition
- Video: [https://www.youtube.com/watch?v=AwQHqWyHRpU](https://www.youtube.com/watch?v=AwQHqWyHRpU)
- Transcript: [002_AwQHqWyHRpU.md](002_AwQHqWyHRpU.md)

3. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 3 - Full-Cycle Deep Learning Projects
- Video: [https://www.youtube.com/watch?v=JUJNGv_sb4Y](https://www.youtube.com/watch?v=JUJNGv_sb4Y)
- Transcript: [003_JUJNGv_sb4Y.md](003_JUJNGv_sb4Y.md)

4. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 4 - Adversarial Attacks / GANs
- Video: [https://www.youtube.com/watch?v=ANszao6YQuM](https://www.youtube.com/watch?v=ANszao6YQuM)
- Transcript: [004_ANszao6YQuM.md](004_ANszao6YQuM.md)

5. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 5 - AI + Healthcare
- Video: [https://www.youtube.com/watch?v=IM9ANAbufYM](https://www.youtube.com/watch?v=IM9ANAbufYM)
- Transcript: [005_IM9ANAbufYM.md](005_IM9ANAbufYM.md)

6. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 6 - Deep Learning Project Strategy
- Video: [https://www.youtube.com/watch?v=G5FNYxbW_Qw](https://www.youtube.com/watch?v=G5FNYxbW_Qw)
- Transcript: [006_G5FNYxbW_Qw.md](006_G5FNYxbW_Qw.md)

7. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 7 - Interpretability of Neural Network
- Video: [https://www.youtube.com/watch?v=gCJCgQW_LKc](https://www.youtube.com/watch?v=gCJCgQW_LKc)
- Transcript: [007_gCJCgQW_LKc.md](007_gCJCgQW_LKc.md)

8. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 - Career Advice / Reading Research Papers
- Video: [https://www.youtube.com/watch?v=733m6qBH-jI](https://www.youtube.com/watch?v=733m6qBH-jI)
- Transcript: [008_733m6qBH-jI.md](008_733m6qBH-jI.md)

9. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 9 - Deep Reinforcement Learning
- Video: [https://www.youtube.com/watch?v=NP2XqpgTJyo](https://www.youtube.com/watch?v=NP2XqpgTJyo)
- Transcript: [009_NP2XqpgTJyo.md](009_NP2XqpgTJyo.md)

10. Stanford CS230: Deep Learning | Autumn 2018 | Lecture 10 - Chatbots / Closing Remarks
- Video: [https://www.youtube.com/watch?v=IFLstgCNOA4](https://www.youtube.com/watch?v=IFLstgCNOA4)
- Transcript: [010_IFLstgCNOA4.md](010_IFLstgCNOA4.md)