Forensic Anthropology 2011 : 06 : Empirical Modeling


[ Music ]>>Good morning. Today I’m gonna throw– give
two lectures before we go to the workshops and the first
one is explaining a little bit the empirical modeling
we used in facial– facial reconstruction
of prediction, better prediction for markers. So, I will give quick
outline here. There is some are
mathematics here but I will try to explain this as best as I
can and if you have questions, please go ahead and ask me. So, here we’re– basically
the outline is basic. And I’m gonna explain
the empirical models versus theoretical here. I think of modelling and
machine learning, the history, the application, the types,
and then regression models which basically parametric
and nonparametric modelling. The reason we– we wanted
to explain the history or a little bit about
the mathematics that went in the next lecture is to
give you an idea where we came up with these numbers
that used to help in the facial reconstruction. So, empirical statistics
calculating probability or information about an
event from past exper– experimental data, determined by
data from an actual experiment. The theoretical statistics there on the other hand is calculating
probability or information about an event based
on sample space of non-equally likely
outcomes determined by finding all the possible
outcomes theoretically and calculating how likely
the given outcome is. And so that’s the
difference between the two. The empirical really you need
to have experiment and you– you get the output
from this experiment and use it in your models. So, machine learning is a term
that I want to explain here because some of the kernel
regression that we– we use in– in the facial reconstruction
is based on strong roots in machine learning and
that’s in computer vision and some of the work we do. So, machine learning is
basically learning the change in the system that are
adaptive in the sense that they enable the system
to do some task or task drawn from the same population
more effectively next time. The whole idea of
machine learning as you can see is it’s always
driven by empirical data from real world and the goal
is to teach computer how to use this data to
predict future data or recognize certain
events or objects. When I want to say teach
computer here means algorithm and software driven. So the computer by
itself will not learn. Of course, you– you have to
write the programs for that. So you have the world
and I have the prediction and you have a new event, then basically you can’t
predict what is the new event. For example here, if you teach
your computer some patterns like this is sample–
example here. So, you learn previous patterns and then you have
something new coming and then basically what your
software should predict, whether it’s a cat or it’s
a– or something else. That’s basically what we
mean by machine learning. We did this actually in some of
my work and this light on top of the left is basically some
of the work that I have done in the gender knee where we
introduce the concept that men and women differ in
orthopedic industry which the– for a while said no. I mean they have one implant
that fits everyone or a number of sizes that fits
men and women. We proved that this is wrong by machine learning,
by experimental data. We proved that the medial side,
medial and anterior-posterior and lateral are different. That was a simple example. As you can see the flow
from learning here it was by model statistics and
two peaks which means that they are different. So, the prediction model
is getting this data in and then predict
sizes for– if– if a new population or a new
person, so it’s based on– the machine will learn
and then you predict. So, optimize machine learning is
optimize a performance criteria using example data
or past experience. Rule of statistics inference
from a sample and the rule of computer science,
really, it’s the part that I said we call
it computer here. It’s efficient algorithm to
solve the optimization problem and represent and evaluate
the model for inference. Machine learning, human
expertise is absent, for example, navigating on Mars. Basically, that’s
all machine learning. Humans are unable to explain
their expertise like in the case of speech recognition. Here, we use machine
learning, vision, language. Solution changes in
time like stock market. All these predictable models,
they are machine learning. Solutions need to be adapted to a particular case
like biometric. In the case of biometric,
the problem size is too vast for our reasoning capability. For example, like
calculating what pages ranks. I mean this is the examples of–
of how you use machine learning. Some of the examples,
probably a lot of you have seen or know before like looking at
patterns of ears, finger print, actually so it’s used
sometimes in imaging and ultrasound for attenuation. Here also in– in the case of
using infrared cameras to look at vain and this is have–
every person has its own pattern and it’s using biometrics. So it’s– this is some of the very obvious applications
of machine learning. Other applications you
may not be aware of, we’re using the same science
as like retail, market– basket analysis, customer
relationship management and biometric, it’s used all
the time, voice recognition, fingerprint, iris and finance and credit scoring,
fraud detection. In manufacturing,
optimization, troubleshooting. In medicine, in medical
diagnosis and telecommunication of course optimizing surf– surface quality and
bioinformatics, the famous and a lot of you know the DNA
sequencing and gene expression, trying to understand
the patterns. And then used by web
mining for search engines like the big Google, all the big
search– you’re using the same– same methods actually. The types of learning machine,
learning can be divided into supervised reinforcement
learning and unsupervised. In the supervised, you also
have– you have classification. You can supervise a
classification as an expert by providing certain labels
because you know the problem, regression and unsupervised
is where the– you really don’t have the
expert here and you try to understand from
the data itself. And unsupervised you have
clustering association and then dimensionality
reduction which is basically we have a
huge dataset like for example, dimensionality reduction
I mentioned yesterday the principal component analysis. So you take large amount
of data and you want to reduce its dimensions so
you can’t put it in a form that you can run statistics on. Reinforcement learning is
basically situation and reward. The examples here like if you–
chess game, building a software. Reward is winning
the game at the end. In tennis game, the same
reward that each point scored and in case of dog training,
treat with every good deed. That’s– that’s what we
call reinforcement learning. It’s not part of
the work we’re doing but that’s one of
the main examples. Unsupervised learning, no labels or feedback, no expert
basically. So, studies how input
patterns can be represented to reflect the statistical
structure of the overall collection
of input patterns. No out– outputs are
used unlike in the case of supervised learning and
reinforcement learning. And– and the outputs here, we
have, you put them into clusters and you try to get
density estimation. The same example I
mentioned earlier about the need was actually
unsupervised learning when we wanted to know the size. So, we did not have a
pre-idea of what are the sizes. We didn’t look at what
companies have implant sizes because I knew it’s not correct. So, we did some clustering here
and completely unsupervised. The clustering here
of hierarchal and we would get some example,
hard clustering K-means and soft clustering
like fuzzy C-means. Clustering here basically– hierarchical clustering
basically use more into the gene expression
type of experiments. And in– in the case
of other clus– like hard clustering K-means, you decide that you may
want this data to be divided into a number of clusters and–
and according to the distance between the data point, it will
put number of clusters for you. And the soft clustering,
which was an example mentioned like early yesterday
was use of fuzzy in it which basically is
called fuzzy C-means. So, you don’t know
the boundaries exactly between the data and there’s
some fuzzification going on. But according, the
algorithm will try its best to cluster the data. The cluster application
said in our case, this is in our case, really. Optimum num– optimum
number of implant sizes. This is as you can see this,
the plots here showing, this is a cross population which
showing that the need differs between black, white
and Asian population. That’s an example of K-means
where when you, from the data that we have a difference in– in the sizes between
ancestral again, ancestral population, okay. [Laughter] We call them ethnic. And then another example
by the segmentation, which is using fuzzy C-mean
here was we try to segment and this is in the
other grant with NIJ.>>You would try to
segment the skull thickness and here we use the–
the fuzzy C-means because sometimes it’s hard. But– but to actually manually
go and segment so with the help of the fuzzy C-means, we
could actually calculate or segment the thickness
of the skull. Supervised learning
needs an expert, that’s on the other hand. Divided into two phases and a
lot of you are probably aware and they are in a
lot of applications. The first phase is training. Of course, because you
have an expert, the expert, you need to train the data. And for a dataset,
expert provides labels. So the expert will come, that’s
happened a lot actually more in the medical– medical
imaging or diagnosis or of course FDA sometimes
like there are a lot of nice work was done on
cancer detection from– from mammography or– and the
expert here will say we look at regions and I didn’t
find these regions. But FDA here does not
really like the computer or the software to
make decisions. So that’s why it’s very hard
to– to standardize this. But this is an example like
where the expert will come and he– he or she will– will
put the labels and the goals to find most probable model
generalize the training data. And testing use inferred model
predict label of new point. So, the whole idea is you train
and then basically if you– if you’re faced with a new
problem, then you’re able, according to this training,
to predict for the new point. Of course, a lot
of problems coming in this, I’ll explain later. So examples of supervised
learning, handwriting recognition
that this is used a lot. Data from pen motion,
character recognition, that’s another example. Scanned document as an image
and that turn into words. They have optical
character recognition. Disease diagnosis, as
I said those properties of patient symptoms left the–
that that could be used, too. Face recognition is very famous
and used a lot in security and other areas where you
picture of a person’s face and then the person
name, not like the– also the way like some
anthropology try to– to use face recognition
to identify person. It’s used also more in the
computer vision area and pattern where you can actually identify
person within thousands of– from a camera and that’s– that’s other face
recognition applications. That harder– a little
bit than having a picture. And then of course spam
detection from email. All of you see spam and
everyday and basically that there are software
out to detect spam and that’s also another example. Now, classification
is very important and basically classification
as simple as is trying to separate two datasets
from each other. So, in this case, example,
ancestry detection. So here we have the nasal
breadth and the nasal– nasion– basion, and this
is the number of– if you look at this carefully,
you find that the blue coming into the red and you need to
find a model that can separate and this is basically a model
that can really go and separate between the two populations. The problem is we come into a problem here
called over fitting. That’s very specific. The more you– you
get complicated with– with separating the–
the data like that, the more you can run
into over fitting. Over fitting is you tailored
your problem completely to your problem. You come with something
else, the system fails. Works 100 percent, 99
percent in your data but goes out then fail. So, in the classification
techniques, of course the linear
classifications and then nonlinear. Examples of linear like precep– perception learning that’s
like early neural networks. You are very familiar
in anthropology or physical anthropology
community with linear discriminant. There’s also nonlinear
and discriminant but– and then the newest
support vector machines and this is the latest
and I’ll give example here for the use of– of
classification techniques. In the nonlinear, you have the
back propagation neural network. That’s where the neural network,
if you heard the term, comes in and then we have the
radial business functions and then nonlinear
support vector machine and decision trees. So, here– here for example, some of the classification
applications that we have. We– for example here, this
is some work of Emam here. He did way back with some of other people was
chromosome classifications and then we did the
patellar sexing that was– we spoke about yesterday. This is another example
of– of classification. Kinematic classification,
from my work on implant design which we actually can– can
classify the different– different types of
motion according to the design of the implant. In our work, with
NIJ skull sexing, 90, all these use classifications. Density mapping that
we spoke yesterday about also use some
sort classification to know exactly the
different regions. And then we use also
an imaging enhancement where you have an
image coming like noisy and then we can actually
basically– denoise the image. Di– all these use,
applications use classification. Now, the regression portion– portion was in the
empirical modeling here. To find a function– functional
description of data was the goal of predicting values
for a new input. That’s the whole
idea of regression, predicting a new input. So, if you have a giving data
and find prediction function F of X that can predict the
value of Y from X. So, that’s– that’s as simple as here. Example like regression,
you have a lin– all of you know reg– linear regression or
nonlinear regression where you feed the data in. Now regression also divided into
parametric and nonparametric. And in the parametric, as we said which
require the expert here, you have neural network,
support vector machine and linear regression. In nonparametric which you
leave that no expert basically, you have the nearest
neighbor, weighted average, kernel regression
and locally weighted. Our work really in
the next lecture and– and basically the software
we’re gonna use based on the kernel regression. So in the parametric as we said,
includes linear regression, neural networks, support vector
machine and other techniques that map relationship–
relationship [laughter] in data by optimizing different
parameter values, using a dataset similar
but not exact. Once the parameters for
the model are identified, the training data
are no longer used and the models prediction
equation is set. As I said earlier, the problems
you have, the over fitting, in the case of a new data
model has to be retained. The data if you can run into is
parametric into data fitting. We’ll give example
here of, for example, something like neural network. We do not use neural
network but we’re giving it as an example of– of a kind of
parametric and training dataset. So, in a single preceptor,
when you have a linear– a linear, and the example we
gave here like problem and, or. If you have one ended with
zero, then it is zero, okay. And, or, a very simple–
simple logic problem. It did as simple as this,
the single preceptor which is linear failed in a– in a case like X or
which exclusive or. The problem is solved in
using nonlinear technique like the Malta layer back
propagation which is not li– nonlinear consist
of multiple layers. I’ll give the example very
soon which has the input, output and hidden layer
of neural networks that does few things
but solve the problem and actually the whole idea
is create multiple high– hyper places when you go
beyond the three of dim. You have a lot of data. If you have a two
dimensional, you can– you can separate
by a straight line. If you have three
dimension, by a plane. Well, if you have a
four dimension problem, we call it hyperplane. It’s something you
cannot visualize but we can relate
back to– to do them– 2-dimension and 3-dimension. The single preceptor was– came
from neural network was big in the ’80s and ’60s
from ’60s and ’80s and ’90 is now the support
vector machine is coming. But was basically
the whole idea is– was exactly modeling
and you neuro– ner– that’s as simple as. So you have the output
coming from number of input and then you have
threshold exactly like how the– a
single neuron work. And then basically, in the case
of the problem here, the neur– problem here is separating
the problem. So if you have 1 on zero or
1 on 1, 1 on 1 and and is 1, so basically, it’s
solving the problem. Alright, so– but this could
not be solved in the other case down here which you
have exclusive or– so if where we have 2 inputs, then you take the
opposite of the input. So if we have 1, 1 the output is
zero or 1, zero the output is– so, it did fail the–
the linear. Then they went actually to the
multilayer back propagation where they start increasing
the weights until– and then train the system
until actually was solved. This is a classification
problem as simple as what was solved very
complicated was the multilayer back propagation. So the data is presented as
input layer and then passed to a hidden layer and the
problem was neural network and maybe it’s not that I
haven’t seen a lot of work with anthropology and
neural network except when we did one paper on the patella were
used neural network plus discriminant analysis. The problem is the neural
network requires the expertise of how many hidden layers. I mean, that comes from express. A lot of tuning parameters
work from the expert.>>So it’s not really– that’s
why it’s not universally used in a lot of appli– application
and the support vector machine which is based on more rigorous
mathematics is coming actually to replace neural networks. That support vector machines, where vector machine has a
math– mathematical strong, mathematical basis
which is basically– if you look at these 2 datasets,
you want to find the best plane that really separate
the two datasets. So, in the case of linear, find
a plane that maximize margin and support vectors
are the points which the margin pushes against. That’s the margin. So we have a plane
separating these two data. And now it’s easier
in the linear case and the nonlinear case,
we use a different method but support vector machines
is very reliable and strong in separating datasets. Now, here’s the example
in a nonlinear case. Now, you know, this case, how
you separate this dataset, the blue from the red, unless
you draw a circle around but that’s again, that’s
not the linear case. So, the– something
called the kernel trick. The kernel trick is taking
the problem from one domain which in this case inseparable and you multiply by
certain function. It takes the problem into a nonlinear
dimension but separable. That’s the trick here. So, it really separates the
data from here into this domain and basically, you have a plane that can separate the
red from the blue. That’s in simple term, how
you go with data inseparable. I use this work in some of my
students including Dr. Emam here and others, which we
for example want to know from MRI data whether we– this is a cartilage or a
bone, MRI very difficult. So, we use this to know whether
this point is a cartilage or a bone. So, that’s– now, in the
nonparametric regression, there’s actual training data to
understand future predictions and store the training
data in a memory matrix. Rather than modeling whole input
space with a parametric model such as neural network
or linear regression, local techniques
construct a local model in the immediate
region of the query. These models are
constructed on the fly. The whole idea here is in
the parametric regression, nonparametric regression is
this, the whole training, it takes a lot of time and
experts, so what we want to do is much simpler approach,
where you have something like a memory or actually the
kernel regression which we use in our work, the– they
call it lazy training because it really does not
require a lot of training before but basically on the
fly it can predict. That’s– that’s the
whole concept of it. And really the concept
came from working on this NIJ grant was a number of people including Dr. Wesley
Hines was really was working on this regression modeling
from nuclear engineering. They used it in the full
tolerance in nuclear reactors. They want something quickly
’cause we can’t train and basically, kernel regression
was one of the methods used in full tolerance and we
actually took it with– with him and we applied
on the problem here and did work very, very well. When the query is made the
algorithm locates training input pattern in its vicinity and
perform a weighted regression with similar observation. The observations are
weighted with respect to their proximity,
to the query point. And in order to construct
a robust local model, one must define a
distance function to measure what is
considered to be local to the query implement
locally with the regression and consider smoothing
techniques such as regularization. Basically here, it’s a lot of
talk but the point is you have to find– when you’re
using these models, you have to find if– if my distance between
the two points include in distance, that’s my measure. That’s basically, you can– you have to find something
that the model can work on and then basically
the model takes– takes over from this point. Some of the — like I
explained nearest neighbor, weighted average or we
take number of point and– and you weigh them
and it was an average, or locally weighted regression
and kernel regression. These are the types of
nonparametric regressions. This is example here,
if you look at this, this is a linear
function Y equal X and nonlinear function 4X
minus 1 over 25X squared and basically, we’re gonna
use this as an example for the different kinds
of models that we explain to predict actually
these shapes. So, we’re not using math here. We’re not solving mathematically
but for the taking sample of points and see if
we feed these models and see we can’t come
with this simple– simple mathematically
but if you’re predicting, it’s not simple but– just for
example here, we’ll take sample of points, so we’ll feed the
model with sample of points like 01530 and test the model
with a complete dataset. So, if we try nearest
neighbor, nearest neighbor which one of the models. If we were to predict
the two value of potentially noisy data point,
when we want to estimate it with the nearest
neighbor or save date. The nearest neighbor can be
calculated using distance measured, as I mentioned
Euclidian distance, and we will compare the function
using the nearest neighbor. And this is, as I said, this
is very crude method here for example. The nearest neighbor
is taking the distance between the– the points. Actually predicts, this is
the shape of the prediction. You look– the straight line
coming gonna step– stepwise. And it did kind of
get an idea of– of the shape of the function. Now weighted average, and don’t
get bothered with the math but it’s basically– the
output is weighted more from nearby points. So– so instead of having
completely independent points, distance between points,
know that this is weighted by the neighbors with
respect to the distance. So in this case, you are trying to predict the Y-hat is
the prediction of the point and you weighted with
information from– from neighboring point. So you get a little bit
better prediction here. Still– still rough but it’s
getting overall the shape of the line and kind of
a little bit rough here. Locally weighted regression,
that’s another way, use linear regression, solve
the following linear model where Y is the vector. Sample of response
variable, X is the matrix of predictive variables and
then the roles are the samples of observation details of vector
of regression coefficients that linearly combine the
predictor’s performed response. So again, here solve the
weighted least squares of the regression equation for
the optimal essence of and– a, here’s some of the math
behind the weighted regression. The important ideas that
be to had it’s basically– it’s the estimates of
the coefficients itself. So, you have the points
and then you have estimates of the coefficients and then
you feed them into the equation and that’s basically
explaining the last curve, how comes Y was so– Y so noisy. Now we’re now moving to the
kernel regression which we used. The kernel regression
is basically you have input examples. That’s not– not training
but as I said local training so you have input examples, X
output examples, Y and the query that you– your input–
that you input point X and then you calculate
the distance using the Euclidian distance. You have a kernel function,
could be a Gaussian weighted and then use the weighted
average to predict– to predict the output Y. This is
basically representing the curve that we, the figure
that I showed. And I said the most common
kernel function is the Gaussian kernel and that’s what we used. D here is the distance,
Euclidian distance and Y hat is the
predicted output and the W are the weights. So, here the difference,
now we’ve seen before, when we use the kernel
regression, we had a much, much smoother and
better– better output. But everything I said, it’s
again, no, no free lunch here. You have to be worried
with a kernel regression about what we call
the bandwidth. The band which– bandwidth
is basically if you look at Gaussian is the
standard deviation. So if you increase, it’s
again an optimization problem. If you increase the
bandwidth too much, you end up with a prediction
not following the function. So you end up with
something like that. If you reduce it very
much, then in– in the– you have another problem
of not getting good result. So it’s an optimization
on the standard deviation. We call it the bandwidth.>>The bandwidth should always
be chosen to be large enough to cover the neighboring points. That’s– that’s what
we’re talking about. So we found, for example
here, in our problem that around 2.6 this was
an acceptable bandwidth to use with the data. Of course you have some margin
of error but you expect that but the error is very small
compared to the other methods. The kernel regression,
they are divided also into what we call the
heteroassociative model with the number of inputs
does not equal the number of outputs, so in the model. In this case, you have– you
are looking at different number of inputs and the out
may not be the same. The auto– the other one which
is inferential model is a number of outputs from the
model is one. So basically you are looking
at certain parameter or looking at one thing and then you have
multiple inputs, example of it. And that’s basically the math
behind what we’re gonna explain in the next lecture. It’s basically we found that the
kernel regression was superior to the other methods was be– being used and it did give
us better prediction of– better prediction than other
methods like neural network or parametric method or– so that that’s basically
the whole point. The take home message is that kernel regression is very
effective tool when we introduce in a problem like predicting
the soft tissue thickness and facial– helping in
the facial reconstruction. Thank you. Any questions?>>A little too much scary math for people first
thing in the morning? [ Laughter ]>>Sorry, we tried– we tried
to minimize the math as much as we can but taking it
out completely will– I didn’t want it to
look like a measure box. It’s not. It’s–
it’s used well– well known methods but put in
a different, different way. I think the new thing
here is using something like kernel regression which
is in hardcore computer vision or nuclear engineering will
try to use it for exam– to estimate the soft tissue. That’s the new– the new–
or I think novel idea is– is to get away from training
datasets and expert and neural, which has been around
for 30 years and use something completely and
you will see in the next lecture on the fly for predicting
the– the thickness. Now.>>I have a question for you. Sorry to interrupt. I know I’ve seen the
last couple of years in physical anthropology, in forensic anthropology
they’re claiming, I think in paleoanthropology
as well. Now, there all of a sudden have
discovered neural networks. So, even though they’ve
been around for– for so long, so you
would suggest that we do not take that route.>>Yes. I’ll tell you why. Anthropology, by the way, I’ll
give you example because my– my– I’ve been right
now 7 since 2003, working with– with
the guys at UT. I’ve seen very sophisticated
statistics but when it’s like talking different
language, that– for example, linear discriminant
is huge in anthropology. A linear discriminant is A–
AB to us because we’re beyond that all our problem
is nonlinear. So, linear discriminant, we don’t use something called
quadratic discriminant.>>And we’ve used this too.>>Yeah. And quadratic, it–
it’s more in the nonlinear. So there is around 20
years difference of– of what basically from the
hardcore engineering methods that being applied to all of
the application and be moving to another– another area
like physical anthropology. So, neural network is a problem
that you’re gonna face and I– I think I put the
paper in the patella. I did compare the neural
network and linear discriminant which both of them
give good results. The problem is, as I said,
neural networks will– will– and I still use sometimes
but I’m moving support of vector machines, a lot
of tuning into the front. I mean like the expert, you rely
a lot on the experts so, and– and how the expert knows
which parameters coefficient, he runs number of experiments. So, you have to have– and
the common wisdom and– and neural network. If for example, you’re gonna
use neural network in training, it’s a machine learning,
we’re gonna train. We call it train–
train your model. So you have to take for
example your experiment. Say– say and you have to divide
it into say 25 percent training and 75 percent the system has
not seen the problem before. But in order to do
something right and get decent results then your
training data set must be huge. Now, in some cases, in the
crania, when you have 5, 10 skulls from Hispanic or how– in order for the number
of points we’re using, for example let– let’s
take example in your work. If you want have all these
points that you’re collecting and you want to use neural
network and training, you must have 10
times the number of dataset that you have, so–>>Right. It’s just unrealistic
[simultaneous talking].>>Unrealistic was– was
the kind when you have– then you have to use something
else and something else as we call the lazy training
here kernel, that’s– fascinated by it by the way
because it does not require this and it get– gives decent–
decent results but– but again you expect
some error– errors here. So, that’s my point. I mean they are discovering
the neural network because some people start
talking about it and we– we’re one of the
people who grew– use neural network with
in anthropology but I– I think it’s right
now, you should not– you can try it but go
move to the current– current message that being used
and computer vision and pattern and machine learning community. [ Pause ]>>Any other questions? So, to point that the kernel–
kernel regression is the method that we use in the
prediction of– and I’ll give a small
introduction here for the next and I will have 5 minutes. The whole idea of kernel
regression is we are not saying that we’re gonna
use different method in reconstructing soft tissue
to help the identification but we’re saying we’re
giving multiple scenario. So the whole idea in the kernel
regression is we tie to some– give scenarios like the body
mass index turned to be crucial in the soft tissue
thickness prediction. The tables that exist, we could
say that we could even use it as an input or actually give
not more but multiple scenarios. So if you find a skull and you
don’t have any other information and the artist is gonna
render or gonna use– build the soft tissues. If you help the artist
or the forensic artist by giving multiple
scenarios like saying, “Okay, what happened if this person
has a certain body mass index?” Then he uses this software to
have a different thicknesses. So basically he can render
or he can build a model in multiple scenario
I mean like– like you can have
multiple scenarios. Give more leverage in
the identification. And we’ll see this
next– next lecture, okay? [ Music ] [ Applause ]