# Forensic Anthropology 2011 : 06 : Empirical Modeling

[ Music ]>>Good morning. Today I’m gonna throw– give

two lectures before we go to the workshops and the first

one is explaining a little bit the empirical modeling

we used in facial– facial reconstruction

of prediction, better prediction for markers. So, I will give quick

outline here. There is some are

mathematics here but I will try to explain this as best as I

can and if you have questions, please go ahead and ask me. So, here we’re– basically

the outline is basic. And I’m gonna explain

the empirical models versus theoretical here. I think of modelling and

machine learning, the history, the application, the types,

and then regression models which basically parametric

and nonparametric modelling. The reason we– we wanted

to explain the history or a little bit about

the mathematics that went in the next lecture is to

give you an idea where we came up with these numbers

that used to help in the facial reconstruction. So, empirical statistics

calculating probability or information about an

event from past exper– experimental data, determined by

data from an actual experiment. The theoretical statistics there on the other hand is calculating

probability or information about an event based

on sample space of non-equally likely

outcomes determined by finding all the possible

outcomes theoretically and calculating how likely

the given outcome is. And so that’s the

difference between the two. The empirical really you need

to have experiment and you– you get the output

from this experiment and use it in your models. So, machine learning is a term

that I want to explain here because some of the kernel

regression that we– we use in– in the facial reconstruction

is based on strong roots in machine learning and

that’s in computer vision and some of the work we do. So, machine learning is

basically learning the change in the system that are

adaptive in the sense that they enable the system

to do some task or task drawn from the same population

more effectively next time. The whole idea of

machine learning as you can see is it’s always

driven by empirical data from real world and the goal

is to teach computer how to use this data to

predict future data or recognize certain

events or objects. When I want to say teach

computer here means algorithm and software driven. So the computer by

itself will not learn. Of course, you– you have to

write the programs for that. So you have the world

and I have the prediction and you have a new event, then basically you can’t

predict what is the new event. For example here, if you teach

your computer some patterns like this is sample–

example here. So, you learn previous patterns and then you have

something new coming and then basically what your

software should predict, whether it’s a cat or it’s

a– or something else. That’s basically what we

mean by machine learning. We did this actually in some of

my work and this light on top of the left is basically some

of the work that I have done in the gender knee where we

introduce the concept that men and women differ in

orthopedic industry which the– for a while said no. I mean they have one implant

that fits everyone or a number of sizes that fits

men and women. We proved that this is wrong by machine learning,

by experimental data. We proved that the medial side,

medial and anterior-posterior and lateral are different. That was a simple example. As you can see the flow

from learning here it was by model statistics and

two peaks which means that they are different. So, the prediction model

is getting this data in and then predict

sizes for– if– if a new population or a new

person, so it’s based on– the machine will learn

and then you predict. So, optimize machine learning is

optimize a performance criteria using example data

or past experience. Rule of statistics inference

from a sample and the rule of computer science,

really, it’s the part that I said we call

it computer here. It’s efficient algorithm to

solve the optimization problem and represent and evaluate

the model for inference. Machine learning, human

expertise is absent, for example, navigating on Mars. Basically, that’s

all machine learning. Humans are unable to explain

their expertise like in the case of speech recognition. Here, we use machine

learning, vision, language. Solution changes in

time like stock market. All these predictable models,

they are machine learning. Solutions need to be adapted to a particular case

like biometric. In the case of biometric,

the problem size is too vast for our reasoning capability. For example, like

calculating what pages ranks. I mean this is the examples of–

of how you use machine learning. Some of the examples,

probably a lot of you have seen or know before like looking at

patterns of ears, finger print, actually so it’s used

sometimes in imaging and ultrasound for attenuation. Here also in– in the case of

using infrared cameras to look at vain and this is have–

every person has its own pattern and it’s using biometrics. So it’s– this is some of the very obvious applications

of machine learning. Other applications you

may not be aware of, we’re using the same science

as like retail, market– basket analysis, customer

relationship management and biometric, it’s used all

the time, voice recognition, fingerprint, iris and finance and credit scoring,

fraud detection. In manufacturing,

optimization, troubleshooting. In medicine, in medical

diagnosis and telecommunication of course optimizing surf– surface quality and

bioinformatics, the famous and a lot of you know the DNA

sequencing and gene expression, trying to understand

the patterns. And then used by web

mining for search engines like the big Google, all the big

search– you’re using the same– same methods actually. The types of learning machine,

learning can be divided into supervised reinforcement

learning and unsupervised. In the supervised, you also

have– you have classification. You can supervise a

classification as an expert by providing certain labels

because you know the problem, regression and unsupervised

is where the– you really don’t have the

expert here and you try to understand from

the data itself. And unsupervised you have

clustering association and then dimensionality

reduction which is basically we have a

huge dataset like for example, dimensionality reduction

I mentioned yesterday the principal component analysis. So you take large amount

of data and you want to reduce its dimensions so

you can’t put it in a form that you can run statistics on. Reinforcement learning is

basically situation and reward. The examples here like if you–

chess game, building a software. Reward is winning

the game at the end. In tennis game, the same

reward that each point scored and in case of dog training,

treat with every good deed. That’s– that’s what we

call reinforcement learning. It’s not part of

the work we’re doing but that’s one of

the main examples. Unsupervised learning, no labels or feedback, no expert

basically. So, studies how input

patterns can be represented to reflect the statistical

structure of the overall collection

of input patterns. No out– outputs are

used unlike in the case of supervised learning and

reinforcement learning. And– and the outputs here, we

have, you put them into clusters and you try to get

density estimation. The same example I

mentioned earlier about the need was actually

unsupervised learning when we wanted to know the size. So, we did not have a

pre-idea of what are the sizes. We didn’t look at what

companies have implant sizes because I knew it’s not correct. So, we did some clustering here

and completely unsupervised. The clustering here

of hierarchal and we would get some example,

hard clustering K-means and soft clustering

like fuzzy C-means. Clustering here basically– hierarchical clustering

basically use more into the gene expression

type of experiments. And in– in the case

of other clus– like hard clustering K-means, you decide that you may

want this data to be divided into a number of clusters and–

and according to the distance between the data point, it will

put number of clusters for you. And the soft clustering,

which was an example mentioned like early yesterday

was use of fuzzy in it which basically is

called fuzzy C-means. So, you don’t know

the boundaries exactly between the data and there’s

some fuzzification going on. But according, the

algorithm will try its best to cluster the data. The cluster application

said in our case, this is in our case, really. Optimum num– optimum

number of implant sizes. This is as you can see this,

the plots here showing, this is a cross population which

showing that the need differs between black, white

and Asian population. That’s an example of K-means

where when you, from the data that we have a difference in– in the sizes between

ancestral again, ancestral population, okay. [Laughter] We call them ethnic. And then another example

by the segmentation, which is using fuzzy C-mean

here was we try to segment and this is in the

other grant with NIJ.>>You would try to

segment the skull thickness and here we use the–

the fuzzy C-means because sometimes it’s hard. But– but to actually manually

go and segment so with the help of the fuzzy C-means, we

could actually calculate or segment the thickness

of the skull. Supervised learning

needs an expert, that’s on the other hand. Divided into two phases and a

lot of you are probably aware and they are in a

lot of applications. The first phase is training. Of course, because you

have an expert, the expert, you need to train the data. And for a dataset,

expert provides labels. So the expert will come, that’s

happened a lot actually more in the medical– medical

imaging or diagnosis or of course FDA sometimes

like there are a lot of nice work was done on

cancer detection from– from mammography or– and the

expert here will say we look at regions and I didn’t

find these regions. But FDA here does not

really like the computer or the software to

make decisions. So that’s why it’s very hard

to– to standardize this. But this is an example like

where the expert will come and he– he or she will– will

put the labels and the goals to find most probable model

generalize the training data. And testing use inferred model

predict label of new point. So, the whole idea is you train

and then basically if you– if you’re faced with a new

problem, then you’re able, according to this training,

to predict for the new point. Of course, a lot

of problems coming in this, I’ll explain later. So examples of supervised

learning, handwriting recognition

that this is used a lot. Data from pen motion,

character recognition, that’s another example. Scanned document as an image

and that turn into words. They have optical

character recognition. Disease diagnosis, as

I said those properties of patient symptoms left the–

that that could be used, too. Face recognition is very famous

and used a lot in security and other areas where you

picture of a person’s face and then the person

name, not like the– also the way like some

anthropology try to– to use face recognition

to identify person. It’s used also more in the

computer vision area and pattern where you can actually identify

person within thousands of– from a camera and that’s– that’s other face

recognition applications. That harder– a little

bit than having a picture. And then of course spam

detection from email. All of you see spam and

everyday and basically that there are software

out to detect spam and that’s also another example. Now, classification

is very important and basically classification

as simple as is trying to separate two datasets

from each other. So, in this case, example,

ancestry detection. So here we have the nasal

breadth and the nasal– nasion– basion, and this

is the number of– if you look at this carefully,

you find that the blue coming into the red and you need to

find a model that can separate and this is basically a model

that can really go and separate between the two populations. The problem is we come into a problem here

called over fitting. That’s very specific. The more you– you

get complicated with– with separating the–

the data like that, the more you can run

into over fitting. Over fitting is you tailored

your problem completely to your problem. You come with something

else, the system fails. Works 100 percent, 99

percent in your data but goes out then fail. So, in the classification

techniques, of course the linear

classifications and then nonlinear. Examples of linear like precep– perception learning that’s

like early neural networks. You are very familiar

in anthropology or physical anthropology

community with linear discriminant. There’s also nonlinear

and discriminant but– and then the newest

support vector machines and this is the latest

and I’ll give example here for the use of– of

classification techniques. In the nonlinear, you have the

back propagation neural network. That’s where the neural network,

if you heard the term, comes in and then we have the

radial business functions and then nonlinear

support vector machine and decision trees. So, here– here for example, some of the classification

applications that we have. We– for example here, this

is some work of Emam here. He did way back with some of other people was

chromosome classifications and then we did the

patellar sexing that was– we spoke about yesterday. This is another example

of– of classification. Kinematic classification,

from my work on implant design which we actually can– can

classify the different– different types of

motion according to the design of the implant. In our work, with

NIJ skull sexing, 90, all these use classifications. Density mapping that

we spoke yesterday about also use some

sort classification to know exactly the

different regions. And then we use also

an imaging enhancement where you have an

image coming like noisy and then we can actually

basically– denoise the image. Di– all these use,

applications use classification. Now, the regression portion– portion was in the

empirical modeling here. To find a function– functional

description of data was the goal of predicting values

for a new input. That’s the whole

idea of regression, predicting a new input. So, if you have a giving data

and find prediction function F of X that can predict the

value of Y from X. So, that’s– that’s as simple as here. Example like regression,

you have a lin– all of you know reg– linear regression or

nonlinear regression where you feed the data in. Now regression also divided into

parametric and nonparametric. And in the parametric, as we said which

require the expert here, you have neural network,

support vector machine and linear regression. In nonparametric which you

leave that no expert basically, you have the nearest

neighbor, weighted average, kernel regression

and locally weighted. Our work really in

the next lecture and– and basically the software

we’re gonna use based on the kernel regression. So in the parametric as we said,

includes linear regression, neural networks, support vector

machine and other techniques that map relationship–

relationship [laughter] in data by optimizing different

parameter values, using a dataset similar

but not exact. Once the parameters for

the model are identified, the training data

are no longer used and the models prediction

equation is set. As I said earlier, the problems

you have, the over fitting, in the case of a new data

model has to be retained. The data if you can run into is

parametric into data fitting. We’ll give example

here of, for example, something like neural network. We do not use neural

network but we’re giving it as an example of– of a kind of

parametric and training dataset. So, in a single preceptor,

when you have a linear– a linear, and the example we

gave here like problem and, or. If you have one ended with

zero, then it is zero, okay. And, or, a very simple–

simple logic problem. It did as simple as this,

the single preceptor which is linear failed in a– in a case like X or

which exclusive or. The problem is solved in

using nonlinear technique like the Malta layer back

propagation which is not li– nonlinear consist

of multiple layers. I’ll give the example very

soon which has the input, output and hidden layer

of neural networks that does few things

but solve the problem and actually the whole idea

is create multiple high– hyper places when you go

beyond the three of dim. You have a lot of data. If you have a two

dimensional, you can– you can separate

by a straight line. If you have three

dimension, by a plane. Well, if you have a

four dimension problem, we call it hyperplane. It’s something you

cannot visualize but we can relate

back to– to do them– 2-dimension and 3-dimension. The single preceptor was– came

from neural network was big in the ’80s and ’60s

from ’60s and ’80s and ’90 is now the support

vector machine is coming. But was basically

the whole idea is– was exactly modeling

and you neuro– ner– that’s as simple as. So you have the output

coming from number of input and then you have

threshold exactly like how the– a

single neuron work. And then basically, in the case

of the problem here, the neur– problem here is separating

the problem. So if you have 1 on zero or

1 on 1, 1 on 1 and and is 1, so basically, it’s

solving the problem. Alright, so– but this could

not be solved in the other case down here which you

have exclusive or– so if where we have 2 inputs, then you take the

opposite of the input. So if we have 1, 1 the output is

zero or 1, zero the output is– so, it did fail the–

the linear. Then they went actually to the

multilayer back propagation where they start increasing

the weights until– and then train the system

until actually was solved. This is a classification

problem as simple as what was solved very

complicated was the multilayer back propagation. So the data is presented as

input layer and then passed to a hidden layer and the

problem was neural network and maybe it’s not that I

haven’t seen a lot of work with anthropology and

neural network except when we did one paper on the patella were

used neural network plus discriminant analysis. The problem is the neural

network requires the expertise of how many hidden layers. I mean, that comes from express. A lot of tuning parameters

work from the expert.>>So it’s not really– that’s

why it’s not universally used in a lot of appli– application

and the support vector machine which is based on more rigorous

mathematics is coming actually to replace neural networks. That support vector machines, where vector machine has a

math– mathematical strong, mathematical basis

which is basically– if you look at these 2 datasets,

you want to find the best plane that really separate

the two datasets. So, in the case of linear, find

a plane that maximize margin and support vectors

are the points which the margin pushes against. That’s the margin. So we have a plane

separating these two data. And now it’s easier

in the linear case and the nonlinear case,

we use a different method but support vector machines

is very reliable and strong in separating datasets. Now, here’s the example

in a nonlinear case. Now, you know, this case, how

you separate this dataset, the blue from the red, unless

you draw a circle around but that’s again, that’s

not the linear case. So, the– something

called the kernel trick. The kernel trick is taking

the problem from one domain which in this case inseparable and you multiply by

certain function. It takes the problem into a nonlinear

dimension but separable. That’s the trick here. So, it really separates the

data from here into this domain and basically, you have a plane that can separate the

red from the blue. That’s in simple term, how

you go with data inseparable. I use this work in some of my

students including Dr. Emam here and others, which we

for example want to know from MRI data whether we– this is a cartilage or a

bone, MRI very difficult. So, we use this to know whether

this point is a cartilage or a bone. So, that’s– now, in the

nonparametric regression, there’s actual training data to

understand future predictions and store the training

data in a memory matrix. Rather than modeling whole input

space with a parametric model such as neural network

or linear regression, local techniques

construct a local model in the immediate

region of the query. These models are

constructed on the fly. The whole idea here is in

the parametric regression, nonparametric regression is

this, the whole training, it takes a lot of time and

experts, so what we want to do is much simpler approach,

where you have something like a memory or actually the

kernel regression which we use in our work, the– they

call it lazy training because it really does not

require a lot of training before but basically on the

fly it can predict. That’s– that’s the

whole concept of it. And really the concept

came from working on this NIJ grant was a number of people including Dr. Wesley

Hines was really was working on this regression modeling

from nuclear engineering. They used it in the full

tolerance in nuclear reactors. They want something quickly

’cause we can’t train and basically, kernel regression

was one of the methods used in full tolerance and we

actually took it with– with him and we applied

on the problem here and did work very, very well. When the query is made the

algorithm locates training input pattern in its vicinity and

perform a weighted regression with similar observation. The observations are

weighted with respect to their proximity,

to the query point. And in order to construct

a robust local model, one must define a

distance function to measure what is

considered to be local to the query implement

locally with the regression and consider smoothing

techniques such as regularization. Basically here, it’s a lot of

talk but the point is you have to find– when you’re

using these models, you have to find if– if my distance between

the two points include in distance, that’s my measure. That’s basically, you can– you have to find something

that the model can work on and then basically

the model takes– takes over from this point. Some of the — like I

explained nearest neighbor, weighted average or we

take number of point and– and you weigh them

and it was an average, or locally weighted regression

and kernel regression. These are the types of

nonparametric regressions. This is example here,

if you look at this, this is a linear

function Y equal X and nonlinear function 4X

minus 1 over 25X squared and basically, we’re gonna

use this as an example for the different kinds

of models that we explain to predict actually

these shapes. So, we’re not using math here. We’re not solving mathematically

but for the taking sample of points and see if

we feed these models and see we can’t come

with this simple– simple mathematically

but if you’re predicting, it’s not simple but– just for

example here, we’ll take sample of points, so we’ll feed the

model with sample of points like 01530 and test the model

with a complete dataset. So, if we try nearest

neighbor, nearest neighbor which one of the models. If we were to predict

the two value of potentially noisy data point,

when we want to estimate it with the nearest

neighbor or save date. The nearest neighbor can be

calculated using distance measured, as I mentioned

Euclidian distance, and we will compare the function

using the nearest neighbor. And this is, as I said, this

is very crude method here for example. The nearest neighbor

is taking the distance between the– the points. Actually predicts, this is

the shape of the prediction. You look– the straight line

coming gonna step– stepwise. And it did kind of

get an idea of– of the shape of the function. Now weighted average, and don’t

get bothered with the math but it’s basically– the

output is weighted more from nearby points. So– so instead of having

completely independent points, distance between points,

know that this is weighted by the neighbors with

respect to the distance. So in this case, you are trying to predict the Y-hat is

the prediction of the point and you weighted with

information from– from neighboring point. So you get a little bit

better prediction here. Still– still rough but it’s

getting overall the shape of the line and kind of

a little bit rough here. Locally weighted regression,

that’s another way, use linear regression, solve

the following linear model where Y is the vector. Sample of response

variable, X is the matrix of predictive variables and

then the roles are the samples of observation details of vector

of regression coefficients that linearly combine the

predictor’s performed response. So again, here solve the

weighted least squares of the regression equation for

the optimal essence of and– a, here’s some of the math

behind the weighted regression. The important ideas that

be to had it’s basically– it’s the estimates of

the coefficients itself. So, you have the points

and then you have estimates of the coefficients and then

you feed them into the equation and that’s basically

explaining the last curve, how comes Y was so– Y so noisy. Now we’re now moving to the

kernel regression which we used. The kernel regression

is basically you have input examples. That’s not– not training

but as I said local training so you have input examples, X

output examples, Y and the query that you– your input–

that you input point X and then you calculate

the distance using the Euclidian distance. You have a kernel function,

could be a Gaussian weighted and then use the weighted

average to predict– to predict the output Y. This is

basically representing the curve that we, the figure

that I showed. And I said the most common

kernel function is the Gaussian kernel and that’s what we used. D here is the distance,

Euclidian distance and Y hat is the

predicted output and the W are the weights. So, here the difference,

now we’ve seen before, when we use the kernel

regression, we had a much, much smoother and

better– better output. But everything I said, it’s

again, no, no free lunch here. You have to be worried

with a kernel regression about what we call

the bandwidth. The band which– bandwidth

is basically if you look at Gaussian is the

standard deviation. So if you increase, it’s

again an optimization problem. If you increase the

bandwidth too much, you end up with a prediction

not following the function. So you end up with

something like that. If you reduce it very

much, then in– in the– you have another problem

of not getting good result. So it’s an optimization

on the standard deviation. We call it the bandwidth.>>The bandwidth should always

be chosen to be large enough to cover the neighboring points. That’s– that’s what

we’re talking about. So we found, for example

here, in our problem that around 2.6 this was

an acceptable bandwidth to use with the data. Of course you have some margin

of error but you expect that but the error is very small

compared to the other methods. The kernel regression,

they are divided also into what we call the

heteroassociative model with the number of inputs

does not equal the number of outputs, so in the model. In this case, you have– you

are looking at different number of inputs and the out

may not be the same. The auto– the other one which

is inferential model is a number of outputs from the

model is one. So basically you are looking

at certain parameter or looking at one thing and then you have

multiple inputs, example of it. And that’s basically the math

behind what we’re gonna explain in the next lecture. It’s basically we found that the

kernel regression was superior to the other methods was be– being used and it did give

us better prediction of– better prediction than other

methods like neural network or parametric method or– so that that’s basically

the whole point. The take home message is that kernel regression is very

effective tool when we introduce in a problem like predicting

the soft tissue thickness and facial– helping in

the facial reconstruction. Thank you. Any questions?>>A little too much scary math for people first

thing in the morning? [ Laughter ]>>Sorry, we tried– we tried

to minimize the math as much as we can but taking it

out completely will– I didn’t want it to

look like a measure box. It’s not. It’s–

it’s used well– well known methods but put in

a different, different way. I think the new thing

here is using something like kernel regression which

is in hardcore computer vision or nuclear engineering will

try to use it for exam– to estimate the soft tissue. That’s the new– the new–

or I think novel idea is– is to get away from training

datasets and expert and neural, which has been around

for 30 years and use something completely and

you will see in the next lecture on the fly for predicting

the– the thickness. Now.>>I have a question for you. Sorry to interrupt. I know I’ve seen the

last couple of years in physical anthropology, in forensic anthropology

they’re claiming, I think in paleoanthropology

as well. Now, there all of a sudden have

discovered neural networks. So, even though they’ve

been around for– for so long, so you

would suggest that we do not take that route.>>Yes. I’ll tell you why. Anthropology, by the way, I’ll

give you example because my– my– I’ve been right

now 7 since 2003, working with– with

the guys at UT. I’ve seen very sophisticated

statistics but when it’s like talking different

language, that– for example, linear discriminant

is huge in anthropology. A linear discriminant is A–

AB to us because we’re beyond that all our problem

is nonlinear. So, linear discriminant, we don’t use something called

quadratic discriminant.>>And we’ve used this too.>>Yeah. And quadratic, it–

it’s more in the nonlinear. So there is around 20

years difference of– of what basically from the

hardcore engineering methods that being applied to all of

the application and be moving to another– another area

like physical anthropology. So, neural network is a problem

that you’re gonna face and I– I think I put the

paper in the patella. I did compare the neural

network and linear discriminant which both of them

give good results. The problem is, as I said,

neural networks will– will– and I still use sometimes

but I’m moving support of vector machines, a lot

of tuning into the front. I mean like the expert, you rely

a lot on the experts so, and– and how the expert knows

which parameters coefficient, he runs number of experiments. So, you have to have– and

the common wisdom and– and neural network. If for example, you’re gonna

use neural network in training, it’s a machine learning,

we’re gonna train. We call it train–

train your model. So you have to take for

example your experiment. Say– say and you have to divide

it into say 25 percent training and 75 percent the system has

not seen the problem before. But in order to do

something right and get decent results then your

training data set must be huge. Now, in some cases, in the

crania, when you have 5, 10 skulls from Hispanic or how– in order for the number

of points we’re using, for example let– let’s

take example in your work. If you want have all these

points that you’re collecting and you want to use neural

network and training, you must have 10

times the number of dataset that you have, so–>>Right. It’s just unrealistic

[simultaneous talking].>>Unrealistic was– was

the kind when you have– then you have to use something

else and something else as we call the lazy training

here kernel, that’s– fascinated by it by the way

because it does not require this and it get– gives decent–

decent results but– but again you expect

some error– errors here. So, that’s my point. I mean they are discovering

the neural network because some people start

talking about it and we– we’re one of the

people who grew– use neural network with

in anthropology but I– I think it’s right

now, you should not– you can try it but go

move to the current– current message that being used

and computer vision and pattern and machine learning community. [ Pause ]>>Any other questions? So, to point that the kernel–

kernel regression is the method that we use in the

prediction of– and I’ll give a small

introduction here for the next and I will have 5 minutes. The whole idea of kernel

regression is we are not saying that we’re gonna

use different method in reconstructing soft tissue

to help the identification but we’re saying we’re

giving multiple scenario. So the whole idea in the kernel

regression is we tie to some– give scenarios like the body

mass index turned to be crucial in the soft tissue

thickness prediction. The tables that exist, we could

say that we could even use it as an input or actually give

not more but multiple scenarios. So if you find a skull and you

don’t have any other information and the artist is gonna

render or gonna use– build the soft tissues. If you help the artist

or the forensic artist by giving multiple

scenarios like saying, “Okay, what happened if this person

has a certain body mass index?” Then he uses this software to

have a different thicknesses. So basically he can render

or he can build a model in multiple scenario

I mean like– like you can have

multiple scenarios. Give more leverage in

the identification. And we’ll see this

next– next lecture, okay? [ Music ] [ Applause ]