Transparency and Reproducibility in Observational Research: Lessons From Anthropology


Our next two speakers
are Melanie Martin. She was a postdoctoral associate
in biological anthropology at Yale. She studies growth and nutrition
among indigenous populations in Bolivia and Argentina
and registered her most recent project on the
open science framework. Melanie holds a BA
from the University of Puerto Rico and an MA and PhD
from the UC of Santa– UC Santa Barbara. She holds a certificate in
college and university teaching and has taught courses
in human variation, evolutionary medicine,
and statistical methods for the behavioral sciences. Her co-presenter
is Brett Beheim is a senior researcher at
the Max Planck Institute for Evolutionary Anthropology. He studies cultural and
technological change using both large scale
economic data sets and ethnographic field work
from small scale societies. He has shared several
research projects on GitHub. Professor Beheim holds a BS
from Emory University and a PhD from the University of
California at Los Angeles. Welcome both of you. OK. So thank you to Michelle
and the other organizers and presenters. I also want to
thank our colleagues in the Yale department– Eduardo
Fernandez-Duque, Margaret Coralee, and Juan Pablo
Perea who together with Brett we’ve formed sort
of a working group to discuss analytical
methods in anthropology that have relevance to questions
about transparency and reproducibility in
observational research, which we’ll both be
talking about today. So to start off,
I actually wanted to explain some of the
research objectives in quantitative anthropology,
because most people are familiar with anthropology
as ethnographic studies of foreign cultures, which are
very important and critical to current anthropology. But there are also
many researchers who are studying cultural
and biological processes in a systematic fashion,
studying human and non-human primate evolution and
behavior, and then, of course, fossil and
archaeological remains from our ancestral
and more recent past. And in studying these topics,
we use theory and methods from a host of natural
and social sciences along with quantitative
methods of data collection and analysis. So some of this
research lends itself very easily to open,
transparent, and reproducible methods. For example, a recent study
published by Cody Ross– an anthropologist
now at Max Planck– he examined evidence of racial
bias and police shootings using a publicly available database
of US police shootings that’s actually crowdsourced
and maintained by Deadspin. Some of the key
findings of this study was that racial bias
was more prevalent in certain large
metropolitan areas, but wasn’t really correlated
with local crime rates. Instead it seemed to be more
predicted by higher income inequality in those areas. He also found that on
average across counties black and unarmed Americans
were 3 and 1/2 times as likely to be shot by police
as white and unarmed Americans. So in addition to the
importance of these findings, this is a wonderful illustration
of open, reproduceable research in anthropology. Because not only does the
data come from a raw– does the raw data come from a
publicly available database, but Ross also made available
in the publication the cleaned and compiled data
sets that he used in the study and all of his code
for conducting the analysis. So that said though,
most quantitative anthropological
research is still done by individual researchers
or small groups of researchers working with data that
they collect often in small remote populations. So an example of that would be
a recent study from Brian Wood– an anthropologist here
at Yale– and colleagues who looked at physical activity
and cardiovascular disease risk among Hadza
hunter-gatherers in Tanzania. And what they found was that
the Hadza are about 14 times as likely to spend
their day engaged in moderate to vigorous
physical activity as compared to same
aged US subjects. You can also see in this
plot from that study– the blue line are males
and the green line are females– that activity
levels aren’t lower with age. In fact, some of
the older adults had relatively higher
activity levels, which is not a trend that
we typically see in the US. They also found in
a separate study that across ages in the Hadza
we see a very low prevalence of hypertension and
other biomarkers of cardiovascular disease
risk, such as LDL cholesterol, high c-reactive protein,
and high triglycerides. Similar results have been shown
and other subsistent scale indigenous populations
and together contribute to our growing
understanding that many of the chronic diseases that
plague citizens in our country and other westernized
nations, including cardiovascular disease risk,
obesity, diabetes, and cancer– these are really diseases
of modern civilization. And we just don’t see
them in populations that still engage in
relatively ancestral behaviors with high activity levels and
relatively low caloric intake. These are some pictures from the
data collection in that study. So you can see subjects
going about their daily lives climbing trees to collect
honey, digging for tubers, and just hanging out. And all of these subjects
have heart rate monitors and GPS monitors
strapped to them, which the researchers
used to actually get the data about their
physical activity. And I illustrate
these really just to show you that this data
is really hard to get. Because first off,
they’re working in a remote area of Tanzania. This is also only
possible because Dr. Wood and his colleagues
have spent decades building trust and understanding
with this population. And on top of that,
they’re collecting electronic and biomarker
data in an area without electricity or
internet or refrigeration. And then on top of all those
logistical difficulties, if you have ever
done field work, you know that in the field,
everything goes wrong and nothing goes to plan. And you have to work
from plan B, C, D, and change your original
design concept multiple times. So for this reason,
anthropological research really is often constrained
to very small data sets or small populations with
one shot data collection that also has to be very
flexible at the same time. So I think this raises
questions for what transparent and reproducible
research in anthropological and other field
studies really is. So to examine that,
we might start by first acknowledging what
observational research can and can’t do. So first off, really,
we don’t usually have experimental controls,
because these may be impossible and often unethical. Second, even if you might be
working with a small sample size or actually the sample
size is the entire population in the case if you’re working
with available skeletal sample. Also, often many
times, our populations are specific to a
certain time and place. It might be rapidly
changing, which is happening with many
subsistence level cultures like the Hazda. And in that sense, then
anthropological research– anthropological
researchers we can never resample our populations. And in that sense, our data
can never be replicated. OK. Secondly, we might
really examine what the goals of
observational research are. And this has been
echoed earlier today too in terms of acknowledging that
very often, we are exploring and describing a known
phenomena and generating new hypotheses from the
patterns that we observe. At the same time,
researchers might be testing existing hypotheses
from existing theories, but using proxies in
observational research rather than
experimental controls. That said, even though we lack
experimental controls to show pure causality, we are
still interested in showing empirical support
for these hypotheses and establishing
robust associations, as these findings then
can inform future research and policy in certain cases. OK. So on top of acknowledging
these challenges and goals that are sometimes specific
to observational as opposed to experimental
research, anthropology is– anybody who
might be familiar with the history of Margaret
Mead’s research might know there’s a long
history of questioning methods and transparency
in anthropology. So thinking about
all these things, we have tried to identify
certain practices that might promote transparency and
reproducibility in anthropology and other observational studies. So with that, I’m going
to hand the talk over to my co-presenter, who is going
to discuss some of these areas, again, that have been mentioned
today in terms of identifying and using appropriate
statistical methods for exploratory versus
confirmatory analysis and emphasizing
reproducible methods rather than replicability of results
through data management and sharing, registering
analytical protocols, and then internal incentives to
promote this behavior. Yes. Thank you. As Melanie described,
in anthropology to paraphrase Monty Python,
every data point is precious. And often the
information that we have about a particular site
or historical location– is that OK? Often the information we have is
possible– the only information that will ever be known about
that particular population. So we have a somewhat
unique situation. But in anthropology, as most
of the physical and natural sciences, there’s active
interest in the open science and reproducibility
movement that I think has kind of been
percolating up here. So in terms of these
statistical aspects of quantitative
anthropology, there’s a interest– a renewed interest
in developing methodologies that are suited towards
observational studies with small samples. And often, this is
predicated on the idea that the traditional
methods in statistics were often developed in
situations that we can’t really apply to the kinds of analyzes
and data sets we’re using. Often, they were
developed in the context– and in the case of many frequent
statistics– in strict control treatment experimental studies,
where we have the ability to model exactly how the
analysis is going to happen before the data is collected. In anthropological research–
observational research in anthropology– often
we forego the direct goal of establishing causality
or treatment effects, and focus instead on
trying to establish reliable or robust associations
between measurements. And there’s also a
small difference, but I think a very important
one between arguing that we’re arguing for
a causal mechanism, as opposed to predicting
a particular phenomenon without necessarily
making direct claims about the causality. This is, of course,
the same idea when it comes to Bayesian
modeling techniques applied to predicting outcomes
that– oh God, I’ve got to get it up on the screen. Sorry. So much like election
predictions and Moneyball– sorry flashbacks– and
the new wave of analyzes that we see in social
science, often this takes on tools drawing
from Bayesian approaches from information theory,
machine learning methods as described, for example,
in Hastie, Tibshirani, and Friedman. These are all
techniques which are coming in to
anthropological analyzes just as much as they are
into other social sciences. One other aspect, which comes up
quite a bit with us especially in anthropology,
which we think maybe as we’re talking
the most about here, is the difference
between exploratory and hypothesis-oriented research, or
is the way that Brian Nosek put it yesterday in
his talk– thinking about looking at
something prospectively as an exploratory approach
or as a justification, scrutinizes any existing theory. And this is another great
paper by Paul Rosenbaum, which was written
specifically for psychology. But I think it’s applicable
to anthropology as well, as a justification
for what we can call nontraditional types
of papers, papers that are oriented around
exploring data sets in a way that could
be described negatively as data dredging or harking,
which is hypothesizing after results
known– both of which have been become relatively
recently identified as serious problems in behind
the non-reproducibility plague that seems to be
happening under our noses. But from an anthropological
point of view, this is often all we can do. We don’t have necessarily
strong hypotheses coming into a particular
field site or location. And we struggle
with the framework of a strict hypothesis
testing sort of idiom, simply because in
many cases, it’s not appropriate, or
the analysis itself motivates particular
relationships between variables. So how to navigate that
with the obvious problems with things like harking
is something that I think is worth talking about more. I will mention one
particular study that I was part of that came
out this year, because I thought that it had a relatively
novel approach at least in anthropology
to doing the data analysis and the presentation. And as Melanie mentioned,
sometimes the reproducibility of the methods is possibly
the best we can do, because we’re dealing with field
populations or with historical archaeological data, which
cannot be replicated. We cannot do another study from
the same source and collect more data from that source. So this is a
particular project that was spearheaded by Siobhan
Mattison at the University of New Mexico collecting
data on parental investment and decision making
in a minority group in southwest China. And the particular research
question that she proposed was is there evidence that
women facultatively change their reproductive
stopping behavior? That is, the number
of children that they will have conditional
on the current sex composition of their
children that they have. So the way that we visualize
the particular results of this model– we
have two different sets of communities within
this population– the Moso of southwest China. And these sort of
honeycomb shapes here represent a sequence
of birth decision making, depending on the existing
number of children and how many children
are boys versus girls. So starting at the top of
each one of the pyramids, we imagine a woman
with no children having a probability of
continuing reproduction– the probability of having
another child– the parity progression ratio as it’s
called in demography. And the orange pyramid starts
at 93% and then the blue pyramid 88.9%. I’ll explain what the
colors mean in a second. And then conditional on having
a particular child, either a boy or girl, we moved
down the pyramid into one of the
corresponding cells. And we have another stopping
probability– or sorry– a continuation probability–
the probability of continuing on to have another child. The stratification here
between the patrilineal and the matrilineal
is theoretically interesting in this
particular study, because we have reasons
to think that groups where property and status and
titles are inherited through the mother’s
line might have a different view of the marginal
value of having another child if it’s going to
be a boy or a girl than on the matrilineal
side of things. And in this
particular society, we have both kinship systems
taking place simultaneously in different villages in
the same ethnic group. So there’s a lot of
similarities between the two, except for a very
marked difference in their kinship systems. And indeed, we do see as you
can see by the colorations here, a relatively strong
implication that women with two sons– or sorry, excuse
me– two daughters in the patrilineal group
have a much higher chance of continuing on and
having a third child than in the matrilineal
group having two sons– or sorry– two daughters. So the reason that I mention
this particular study is that we have the data
and all of the materials, including the R code,
that this was done in R, available in GitHub and the link
being down at the bottom there. And the paradigm that
we’re working here is using language from the open
source software environments, which I think there’s a lot of
idiomatic similarities here. So this is a potential
visualization of a research project done under
strict version control system, like using [? Git, ?]
for example. And I think there’s
a lot of similarities in this picture to what we
saw during Brian’s talk. Each of these nodes represents
a particular snapshot of a project continuing
on from the initialization of the project all the way to
publication or post publication peer review feedback
on some kind of potentially Amazon
comments type system, like we heard
during Alan’s talk. And the contention
that I have here is that this kind
of framework is– at least in terms of laying out
how a project might develop– is compelling for us as
anthropologists, because it implies there’s a chronological
sequence that’s stored, that can be shared. This particular
visualization here also has different branches
that correspond to– we could imagine–
branches that were willing to be made public–
maybe the blue line at the top there, the master branch. And then other components
during the development process that for reasons of
confidentiality or sensitivity, we don’t have the
ability to disclose or we don’t have as
available for public release. And this is maybe the last
point that, in anthropology particular, it’s very often
the case that anthropologists see themselves not
only as researchers for particular population,
but also as advocates for that population. There’s usually a very
large power difference between the groups
that we work with and ourselves as researchers. And so that informs the way
we look at data releases. And I can say, just speaking
from my own ethnographic experience with
anthropology, we generally are very much data hoarders. And part of the reason
why is because we’re afraid there’s going to
be unintended consequences of making too much information
available up the groups we work with. So in that case, in
this particular project with the Moso, I felt that the
solution or the happy medium was that we strictly had
the variables and the data points that were used for the
analysis available with nothing else. And to some extent, scrambled
so that they couldn’t be used to match up
to individual people without distorting any
of the signals that were done during the data analysis. So with that, I think we
should mention two things that are sort of
developments here that I think are very hopeful or
insightful in anthropology. Mel is responsible for
one of the first projects in the open source framework in
anthropology– pre-registering and analysis she’s doing right
now on the Age at Menarche. And also, to again,
plug the badges, which is, of course, responsible
from the other open source framework innovation here. This is something which
anthropologists here at Yale have advocated for
in the International Journal of Primatology. And you can see
the citation here. So yeah. In conclusion, we
think that there’s a lot of scope for these
ideas in anthropology. But there’s some concerns or
qualifications that are maybe unique to our particular field. And thank you. [APPLAUSE]