PDA

View Full Version : Science Where's the Science Cave?!



Grinner
2014-05-06, 07:36 AM
When I listen to the radio or read the news, there's always something new data suggesting this and new data suggesting that.

I have to wonder just one thing: where is all of this data? I do read scientific literature on occasion, and I never find charts upon charts of raw data. There's usually a summary of the data, but never the data itself.

I ask because I had an idea for a program which would attempt to correlate any datasets given to it, but it's difficult to find data in the first place. The best I've found is data.gov (http://www.data.gov), but the selection there is fairly limited.

Eldan
2014-05-06, 07:44 AM
Somewhere on a secure university server. Several reasons. First of all, scientists hoard their data like crazy. If the data got out, someone else could publish first, which ruins careers. Second, publication lengths in most journals are severely limited. Science gives you a few hundred words, sometimes. Other give you a few pages. There's simply no space to show your raw data. Often, there's a footnote telling you which author to contact if you have questions. Some may decide to share the data with you.

Grinner
2014-05-06, 07:55 AM
First of all, scientists hoard their data like crazy. If the data got out, someone else could publish first, which ruins careers.

Makes sense, though as my second grade teacher would tell you, using someone else's work is cheating. It's disturbing to think that an elementary school classroom has stronger ethics than an academic laboratory.

Is there a reason for sitting on the data after you've published, though? Maybe I'm just spoiled by digital formats, but it seems like it would be easy enough to do.

Eldan
2014-05-06, 08:07 AM
Ethics? No. It's pretty cutthroat, actually. Your number of publications and their importance determines how much funding your lab gets. If the lab doesn't have enough funding, it has two options. Fund cheaper projects (which means everyone publishes less important articles) or fire people.

A giant electronic database could be easy. There's several problems, though. Scientific data is often a giant mess no one but whoever wrote it in the first place will understand. Many people are hired for anything from half a year (interns, master students) to a year (Post-Docs) to three years (PhD students). They gather data, they publish, they leave and the data sits somewhere in a forgotten folder.

Even on a small project with three or four people working on it, you usually spend a lot of time just asking what all the abbreviations in your colleague's dataset mean. And the colouring. And why there are points missing. And what the units are.

Grinner
2014-05-06, 09:15 AM
Even on a small project with three or four people working on it, you usually spend a lot of time just asking what all the abbreviations in your colleague's dataset mean. And the colouring. And why there are points missing. And what the units are.

I laughed at this.

Thanks. :smallsmile:

Eldan
2014-05-06, 09:26 AM
Yeah. I opened an excel file from a colleague today. It had 12 sheets on it, including pivot tables of other pivot tables. It somehow started with 5000 samples and got that down to 300 data points.

And the variable names were building up.

Hair (h)-> Intermediary hair (IntH) -> Species as determined by the presence of intermediary hair (SpIntH)/Species determined without looking at intermediary hair as a factor (SpNIntH) -> Majority species in a population as determined by the presence of intermediary hair (MajSpIntH)

And so on.

Then the statistical analysis on top of that.

Clustering of the individual as based on PCA analysis of the majority species in a population as determined by the presence of intermediary hair (ClustMultiPCMajSpIntH).

CarpeGuitarrem
2014-05-06, 09:46 AM
Speaking as someone in the tech industry...

...is there any sphere that isn't somehow incredibly dysfunctional like this? :smallbiggrin:

NichG
2014-05-06, 02:51 PM
Actually, in many fields its considered the ethical responsibility of the author of a journal article to provide raw data on request. I wouldn't say people sit on it to protect their secrets, but rather because the actual raw data may often be gigabytes or terabytes of disorganized, custom file format stuff. Preparing a publication takes long enough as it is - adding the time to curate that kind of data and make it accessible would significantly exacerbate that.

For example, I did a paper on 3d simulations of granular jet impact. We published a graph of the shape and averaged fields (density, velocity, pressure, etc) in the paper. The actual raw data is sitting on a terabyte external drive on my desk that I bought when I moved so I could take the data with me. Most of it is bzipped save-state files for the simulation code, which contain the individual positions/velocities of a few million grains every few tens of thousands of timesteps. I could have included those particular simulation files in the supplemental material, but in practice its just easier and more useful to give out the code and initial conditions and let people generate it on their own.

I'm not really concerned about anyone scooping me on it or anything like that.

DrK
2014-05-06, 04:31 PM
We routinely check the raw data when reviewing papers relating to protein structure. You'd be surprised at the tricks people try to play hiding occupancy or poor torsion chemistry in the important structural bits.

Also many funding bodies in the UK require free public access and raw data available as supplemental info for biochem papers (my field, so wouldn't want to comment on the other sciences)

Hiro Protagonest
2014-05-06, 06:14 PM
Makes sense, though as my second grade teacher would tell you, using someone else's work is cheating. It's disturbing to think that an elementary school classroom has stronger ethics than an academic laboratory.

Not really. Pretty much any adult enterprise has weaker ethics than an elementary school classroom.

The Grue
2014-05-06, 06:53 PM
Not really. Pretty much any adult enterprise has weaker ethics than an elementary school classroom.

The problem is an elementary school classroom has a tangible reward for good behaviour. Out there in the adult world there's no teacher to give you a gold star and extra marks for following the rules or make you sit in the hallway if you don't; the Prisoner's Dilemma reigns supreme. Sure, you could follow the rules and play nice with the other grown-ups, but if one of them decides to break the rules to gain an advantage you get screwed. The safest thing to do is break the rules and screw them first.

warty goblin
2014-05-06, 09:34 PM
The problem is an elementary school classroom has a tangible reward for good behaviour. Out there in the adult world there's no teacher to give you a gold star and extra marks for following the rules or make you sit in the hallway if you don't; the Prisoner's Dilemma reigns supreme. Sure, you could follow the rules and play nice with the other grown-ups, but if one of them decides to break the rules to gain an advantage you get screwed. The safest thing to do is break the rules and screw them first.

There is a tangible award for good behavior in academia at least; it's called 'still having a career.' Seriously, if you get caught plagiarizing or engaging in other forms of academic dishonesty you'll get buried so deep the paleontologists won't be able to find you.

The Grue
2014-05-07, 03:12 AM
There is a tangible award for good behavior in academia at least; it's called 'still having a career.' Seriously, if you get caught plagiarizing or engaging in other forms of academic dishonesty you'll get buried so deep the paleontologists won't be able to find you.

The four key words there being "if you get caught".

Knaight
2014-05-07, 03:29 AM
The four key words there being "if you get caught".

Which pretty much takes all of anyone who reads your paper noticing stuff coming from a previous one in many cases. Such as the authors of the previous paper. Even if the ethics are completely ignored, plagiarism is a dumb idea.

Eldan
2014-05-07, 03:31 AM
Yup. There was just a mid-sized scandal over here when it turned out that a Federal politician (the equivalent of a US Senator, I think) and Professor of Medicine, had given out about a dozen PhDs to students who hadn't earned them and apparently hadn't even written any real theses.
So, big "if".

NichG
2014-05-07, 01:11 PM
Direct plagiarism isn't really the sort of unethical thing that's very commonplace. There's shady stuff going on but it tends to be a lot harder to prove or do anything about - for example, referees tanking a paper in review because they're going to publish a similar one or one of an opposing view and they want to have the first word or various other reasons, or people sharing information gained in confidence to someone who then goes and rushes a research project to finish the idea before the original group does (e.g. one group has a grad student working on a project for their thesis on a 5 year timescale, but another group hears about the main ideas, drops a postdoc on it, and rushes it out the door in 6 months).

That said, I'm not convinced that even this kind of thing is incredibly common. We all get frustrated at referees, and its hard to really tell when they're just having trouble understanding the paper or when they're being deliberately obtuse in order to scuttle something because of that particular emotional filter - its easy to think that there is intentional malice, or that you recognize the anonymous referee based on writing style or what citations they want you to include.

Similarly, not every case of research getting scooped has to do with someone using confidential information to get a leg up - generally there are a small set of low-hanging fruit in any field that everyone is going for, so its pretty common to see a bunch of papers come out on a single topic almost simultaneously as everyone makes a shot for it. In terms of the granular jet thing, I talked to one researcher who said 'yeah, we had a project like that years ago but it didn't go anywhere' and there was another group actively pursuing it and publishing on the topic at the same time as we were (thankfully their focus was different than ours, so it was more of a back and forth rather than scooping eachother).

shawnhcorey
2014-05-07, 06:34 PM
The problem is an elementary school classroom has a tangible reward for good behaviour. Out there in the adult world there's no teacher to give you a gold star and extra marks for following the rules or make you sit in the hallway if you don't; the Prisoner's Dilemma reigns supreme. Sure, you could follow the rules and play nice with the other grown-ups, but if one of them decides to break the rules to gain an advantage you get screwed. The safest thing to do is break the rules and screw them first.

Actually, the best strategy for the Prisoner's Dilemma is Tit for Tat (https://en.wikipedia.org/wiki/Tit_for_tat). Basically, it's cooperation in the first round and then do whatever the other guy did in the previous round.

Samuel Sturm
2014-05-07, 09:14 PM
A giant electronic database could be easy. There's several problems, though. Scientific data is often a giant mess no one but whoever wrote it in the first place will understand. Many people are hired for anything from half a year (interns, master students) to a year (Post-Docs) to three years (PhD students). They gather data, they publish, they leave and the data sits somewhere in a forgotten folder.

Even on a small project with three or four people working on it, you usually spend a lot of time just asking what all the abbreviations in your colleague's dataset mean. And the colouring. And why there are points missing. And what the units are.

Oh my sweet unholy ***** this.

I'm currently working on a project involving 16k records of furniture sales. Over half my time in this project has been cleaning up the data so it was usable from how the data entry people entered it. Sure, I'll provide my raw data to anyone who asks nicely. Good luck.

endoperez
2014-05-08, 12:32 PM
Look up Wolfram programming language, it's basically this. It's a programming language that incorporates data. For example, you can output a string of 'hello (randomly return a capital of one the 10 biggest countries by income per capita)'.

Palanan
2014-05-09, 11:14 AM
Originally Posted by Eldan
Science gives you a few hundred words, sometimes. Other give you a few pages. There's simply no space to show your raw data.

There's always SOM (supplemental online material) which can give you the option to present a lot more information, at least in terms of methods and results--which is really crucial to the nuts and bolts of the science, but which a lot of casual readers will skip over anyway. --I do agree with your point, though, that journals themselves don't have room for the raw data itself.

As for server storage and so forth...I suppose this depends on the research and the institution. My data are obsessively backed up located on a variety of disks and portable drives, most of which are within twenty feet of me right now. My advisor probably has copies on a backup drive (assuming it survived a nasty disk failure a couple years ago) but that's about it. Nothing of mine is on a server that I know of. Published articles, yes, but not the original data.


Originally Posted by warty goblin
There is a tangible award for good behavior in academia at least; it's called 'still having a career.' Seriously, if you get caught plagiarizing or engaging in other forms of academic dishonesty you'll get buried so deep the paleontologists won't be able to find you.

Sadly, it's not always that straightforward. Sometimes, yes--and sometimes not.


Originally Posted by Eldan
There was just a mid-sized scandal over here when it turned out that a Federal politician (the equivalent of a US Senator, I think) and Professor of Medicine, had given out about a dozen PhDs to students who hadn't earned them and apparently hadn't even written any real theses.

On the other hand, something on this scale is pretty hard to ignore.


Originally Posted by Eldan
Scientific data is often a giant mess no one but whoever wrote it in the first place will understand.

And this is so, so true.

When I was working as a field tech right after college, I asked my supervisor if she was worried that anyone might make untoward use of our raw field data, since we were dealing with some locally sensitive issues.

Her answer: "I dare anyone to make sense of it!" She was joking, but it was true enough.

Eladrinblade
2014-05-09, 06:11 PM
And this is so, so true.

When I was working as a field tech right after college, I asked my supervisor if she was worried that anyone might make untoward use of our raw field data, since we were dealing with some locally sensitive issues.

Her answer: "I dare anyone to make sense of it!" She was joking, but it was true enough.

This is why spellcraft is a skill. Too bad you guys don't have read magic.

Zrak
2014-05-09, 07:09 PM
There is a tangible award for good behavior in academia at least; it's called 'still having a career.' Seriously, if you get caught plagiarizing or engaging in other forms of academic dishonesty you'll get buried so deep the paleontologists won't be able to find you.

You mean buried so deep in money and paperwork from still being chair of your department, right? Because that's what happens if you get caught just straight making up the results of drug trials you were paid to perform.

Bulldog Psion
2014-05-09, 09:43 PM
If scientific data really is that compartmentalized, hidden, chaotic, and encoded in personal abbreviations, then I truly stand in awe of the fact that humanity has created a massive airline network, gotten to the moon, created extremely reliable vehicles and computers, and so on.

Just think what we would be capable of if there was a central database of knowledge, presented in a comprehensible format. :smalleek: We'd already be working on our galactic imperium on three different planes of the multiverse, probably.

warty goblin
2014-05-10, 09:26 AM
If scientific data really is that compartmentalized, hidden, chaotic, and encoded in personal abbreviations, then I truly stand in awe of the fact that humanity has created a massive airline network, gotten to the moon, created extremely reliable vehicles and computers, and so on.

Just think what we would be capable of if there was a central database of knowledge, presented in a comprehensible format. :smalleek: We'd already be working on our galactic imperium on three different planes of the multiverse, probably.

Actually, I strongly suspect that if we kept all the data in one giant central database, we'd never have gotten any of that stuff done at all and would be working full time to standardize, document and encode the our data to be up to spec, or else trying to bludgeon it out of the standardized format and into one that was actually useful for the project at hand.

Palanan
2014-05-11, 10:52 AM
Originally Posted by warty goblin
...or else trying to bludgeon it out of the standardized format and into one that was actually useful for the project at hand.

o GAWD so true.

I once worked on a project for...an institute which shall remain nameless, which involved doing species assessments according to their massive, cumbersome, thoroughly overengineered data-entry template. They farmed out segments of the total project to universities around the country.

The first problem was, you really can't shoehorn each and every species of interest into a single all-encompassing stack of spreadsheets. Problematic example: they had a whole subsection devoted to the size of geographic ranges. We were supposed to work out the polygons that approximated each species' known range across the continent, and then plug in values for total area and etc. If a certain species is only found in western Maryland, you work out the square kilometers of that area in western Maryland. Looks simple on the face of it.

But what do you do with a freshwater mussel? They don't live in easily delineated landscape polygons; they live in riverine networks, which are dendritic threads running through those landscapes. You can easily define the watershed for a river, but mussels don't live in watersheds; they live in the streams those watersheds feed into. It's a critical distinction the spreadsheet programmers completely overlooked--and thus, an entire subsection of the standardized global template was completely meaningless for the species involved.

Worse yet, the parent institute refused to listen, on this and many other issues. Early on I came up with a shortcut to save hours of data entry time, which involved changing one of the default values to a number that made more sense for what I was doing. They acknowledged my way was better, but they wouldn't hear of implementing it. They'd rather waste hours of someone else's typing than spend seven minutes recoding the template.

My group ended up dropping that project; it was immensely more work than we'd been promised, and the parent institute was too locked into their top-down command mode to even make a pretense of listening to the people actually doing their work for them. It was a classic example of how not to run a massive collaborative project.


Originally Posted by Bulldog Psion
If scientific data really is that compartmentalized, hidden, chaotic, and encoded in personal abbreviations, then I truly stand in awe of the fact that humanity has created a massive airline network, gotten to the moon, created extremely reliable vehicles and computers, and so on.

Also, keep in mind that Eldan, WG and I are mainly discussing academic research projects, which are very different from research programs controlled by governments, corporations or even larger NGOs. In those latter cases, especially with governments, there's probably much more data-sharing within programs, although I'll defer to anyone who's worked in federal research.

As for the international airline network, that's probably a classic case study in the evolution of a complex artifical system, since as far as I know it grew up gradually rather than being designed from the start. I know early air connections were pretty haphazard, not least because aircraft in the 20s and 30s really had to worry about weather conditions we'd barely notice today. That would be really interesting to look into, though--how modern commercial aviation developed an integrated global network.

warty goblin
2014-05-11, 11:18 AM
On the project I'm working on, we have these weekly meetings, which are mostly about how much my co-researcher wants to spend an enormous amount of stuff standardizing everything into a relational database and this that and the other thing.

It's a statistics project. At the outside there's a couple hundred datasets to deal with, broken up by state. I solved this problem months ago by putting all the datasets for a state in a folder with that state's name. But no, we need to have an Official Abbreviation List for the numeric codes we use, and an Official Correspondence Table between this data and that data and an Official This and Official That. It's like the guy thinks we're running an enterprise software team and hundreds of people will need to be able to use our code and data. Our data is various levels of confidential, is complicated enough that it's irresponsible to release the non-confidential parts, and odds are very good that over the entire lifetime of the project maybe a dozen people will ever see this stuff. It's worth documenting; but I don't think we need an Official R Code Style Guide, or an Official Text Abbreviation for Cultivated Cropland. The other day this dude was complaining there wasn't a universally agreed upon definition of forest.

None of that stuff gets us any closer to actually producing a usable statistical product. Which is the thing we're actually being paid to produce.

SiuiS
2014-05-11, 12:05 PM
Makes sense, though as my second grade teacher would tell you, using someone else's work is cheating. It's disturbing to think that an elementary school classroom has stronger ethics than an academic laboratory.

Is there a reason for sitting on the data after you've published, though? Maybe I'm just spoiled by digital formats, but it seems like it would be easy enough to do.

Not ethics, oversight. second graders will get aught, because there is no cheating in the field above them. The scientists who make their living off of getting grants tend to cheat as a rule, so it's more like professional cycling in that someone who doesn't cheat is going to stand out more, if you see them at all.


Although I love the idea that science is mostly predicated on whether or not you choose to believe someone whose career and livelihood are on the line but who simultaneously has literally no incentive for transparency. Totally acing that whole free flow of information thing, Science!

MadZuri
2014-05-11, 12:12 PM
There are some interesting and meaningless correlations (http://tylervigen.com/) that you can make from data sets. Use with caution and a sense of humor.

NichG
2014-05-11, 02:13 PM
On the project I'm working on, we have these weekly meetings, which are mostly about how much my co-researcher wants to spend an enormous amount of stuff standardizing everything into a relational database and this that and the other thing.

It's a statistics project. At the outside there's a couple hundred datasets to deal with, broken up by state. I solved this problem months ago by putting all the datasets for a state in a folder with that state's name. But no, we need to have an Official Abbreviation List for the numeric codes we use, and an Official Correspondence Table between this data and that data and an Official This and Official That. It's like the guy thinks we're running an enterprise software team and hundreds of people will need to be able to use our code and data. Our data is various levels of confidential, is complicated enough that it's irresponsible to release the non-confidential parts, and odds are very good that over the entire lifetime of the project maybe a dozen people will ever see this stuff. It's worth documenting; but I don't think we need an Official R Code Style Guide, or an Official Text Abbreviation for Cultivated Cropland. The other day this dude was complaining there wasn't a universally agreed upon definition of forest.

None of that stuff gets us any closer to actually producing a usable statistical product. Which is the thing we're actually being paid to produce.

This kind of personality crops up all over the place I think. You see it in programming projects too, when someone decides 'hey, Python is a better language than Fortran for this, we should rewrite this entire program in Python because that'll make it a lot easier to do the minor changes we have to make' and things like that. I've seen a lot of opensource projects stall out when someone decided to do a massive refactor and interesting in the project just dies since there's no movement on the main features or even bug fixes for the year or so it takes them to get into the refactor.

It seems like its just a genuinely hard task to figure out the right level of organization for a project - people generally tend to massively overshoot or undershoot until they've done five or six very similar projects and basically just know the answer from experience.

the_druid_droid
2014-05-12, 02:04 AM
Not ethics, oversight. second graders will get aught, because there is no cheating in the field above them. The scientists who make their living off of getting grants tend to cheat as a rule, so it's more like professional cycling in that someone who doesn't cheat is going to stand out more, if you see them at all.


Although I love the idea that science is mostly predicated on whether or not you choose to believe someone whose career and livelihood are on the line but who simultaneously has literally no incentive for transparency. Totally acing that whole free flow of information thing, Science!

Actually, the issue isn't so much free flow of information to the public as it is open communication of results among experts. Basically the logic goes that you do have incentive to cheat to get ahead, but others in your field have incentive to root out stuff you've done that's bad so they can get ahead of you. And any reputable journal will submit everything to peer review before it goes to press, where people get to have a field day with this sort of thing (of course, that clearly breaks down from time to time).

Also, if you claim something too wild you'll almost definitely get caught, since everyone will want verification of the results (or worse yet, expect to use your "results" as part of some ongoing research result). As people have pointed out, there are some individuals who manage to wrap themselves tightly enough in bureaucracy and legal snarls to cover themselves even if they get found out, most people will get in trouble.


There are some interesting and meaningless correlations (http://tylervigen.com/) that you can make from data sets. Use with caution and a sense of humor.

Not sure it's entirely on-topic, but some of those are great :smallbiggrin:

Eldan
2014-05-12, 02:47 AM
"Nicolas Cage movies ~ Deaths by drowning in Swimming pools" may be one of my favourite correlations ever. Even if it's only 0.66.

SiuiS
2014-05-12, 03:05 AM
Actually, the issue isn't so much free flow of information to the public as it is open communication of results among experts. Basically the logic goes that you do have incentive to cheat to get ahead, but others in your field have incentive to root out stuff you've done that's bad so they can get ahead of you. And any reputable journal will submit everything to peer review before it goes to press, where people get to have a field day with this sort of thing (of course, that clearly breaks down from time to time).

Aye. Mostly I'm concerned about science getting in it's own way, because the business, the action, and the ideal all need different things. Mostly it's that I'm looking at the dark side of things. The light side is there, but that's not what I'm pointing and laughing at.

My preference is that the weasely bureaucratic stuff gets so caught up chasing it's own tail it gets tied down and lets adults do their thing unhindered. While that's in the process though, it's frustrating because you get to watch human stupidity at full volume with subtitles from the very people who really should know better.


Also, if you claim something too wild you'll almost definitely get caught, since everyone will want verification of the results (or worse yet, expect to use your "results" as part of some ongoing research result). As people have pointed out, there are some individuals who manage to wrap themselves tightly enough in bureaucracy and legal snarls to cover themselves even if they get found out, most people will get in trouble.

Yup! It still happens though, and it makes me sad. I suppose there is a benefit, you've got to be sharp to navigate the morass well or sharper to be straight and narrow and still succeed. It's when the process gets bogged down by procedurals to maximize the procedurals, instead of succeeding, that I scoff.

I am now very self conscious of the amount of notes I have on how to get my notes together to get my life together though. Hahaha.

the_druid_droid
2014-05-12, 12:10 PM
"Nicolas Cage movies ~ Deaths by drowning in Swimming pools" may be one of my favourite correlations ever. Even if it's only 0.66.

My personal favorite was something like "Biomedical PhDs vs. Deaths by Alcohol Poisoning". It's something like 0.99 or 0.98, too!


Aye. Mostly I'm concerned about science getting in it's own way, because the business, the action, and the ideal all need different things. Mostly it's that I'm looking at the dark side of things. The light side is there, but that's not what I'm pointing and laughing at.

My preference is that the weasely bureaucratic stuff gets so caught up chasing it's own tail it gets tied down and lets adults do their thing unhindered. While that's in the process though, it's frustrating because you get to watch human stupidity at full volume with subtitles from the very people who really should know better.

Yeah, at the end of the day, science is a human endeavor conducted by humans, so it's going to break down from time to time. The primary saving grace is that everyone is trained to be skeptical, with the (hopeful) result that any fraud ends up being less dramatic and smaller-scale than it would be in other arenas of human life, simply because you've either had to fly under the radar with smaller lies, or other people have caught on to your big one.

NichG
2014-05-12, 01:12 PM
Honestly, things like fraud and unethical practice are far less of a problem in science (or at least the fields I've seen) than the problem of selection bias vs relevancy. Essentially the way this goes is, no one bothers publishing negative results because getting a paper published is a significant time investment and there's very little interest for most researchers to read about what failed. The problem then is, if you only publish positive results then its hard to accurately gauge the rate of false positives (for example... (http://xkcd.com/882/)).

But now lets say we incentivized people to publish their negative results too. That would cause a glut of papers that, for most researchers, would pretty useless. That means even more time spent digging through chaff to find a couple papers that actually help with one's research. Its already the case that people barely skim most papers - publishing something is more of a way to archive information for the handful of researchers that do need to reference your work, rather than a way to make people aware of your work.

If we come back to the OP's question, a database of negative results where people could just quickly make a note of what didn't work might actually be a good use for that sort of centralized project (at least within specific subfields where there are large numbers of nearly equivalent experiments, so the statistics are meaningful - things like drug trials, for example).

Knaight
2014-05-12, 03:14 PM
But now lets say we incentivized people to publish their negative results too. That would cause a glut of papers that, for most researchers, would pretty useless. That means even more time spent digging through chaff to find a couple papers that actually help with one's research. Its already the case that people barely skim most papers - publishing something is more of a way to archive information for the handful of researchers that do need to reference your work, rather than a way to make people aware of your work.

This is where things like metastudies including unpublished data come in - they do lag behind by necessity (as they need a number of studies to have been done on a topic), and we could use more of them than we actually have, but they do help with selection bias in publications, particularly as a result wherein the consensus of published work is wrong gets attention, even if it's wrong because the actuality of the situation they are about is that nothing interesting is happening.

Telonius
2014-05-13, 10:40 AM
Availability depends somewhat on the field. Things like the RCSB Protein Data Bank, AddGene, GenBank, EMDB, and the Worldwide Protein Data Bank do exist.

One other big problem is that deposition in one of those databases tends to take a long time. So even if an author does everything they're supposed to, it might be months before the depository actually lists the data.

Grinner
2014-05-13, 02:49 PM
Not sure it's entirely on-topic, but some of those are great :smallbiggrin:

That could not be more on topic; that is specifically what the program I had in mind would be intended to accomplish. :smalltongue:

TuggyNE
2014-05-13, 11:43 PM
Basically the logic goes that you do have incentive to cheat to get ahead, but others in your field have incentive to root out stuff you've done that's bad so they can get ahead of you.

I can understand that in theory, but I'm not entirely convinced it works quite so smoothly. For one thing, what researcher ever got famous by proving someone else had been unethical? Blowing the whistle on so-and-so does, certainly, reduce competition in general, and may even improve one's own standing a bit, but is it nearly so compelling a result as "such-and-such has been proven"?

Maybe I'm just too cynical.

the_druid_droid
2014-05-14, 12:07 AM
I can understand that in theory, but I'm not entirely convinced it works quite so smoothly. For one thing, what researcher ever got famous by proving someone else had been unethical? Blowing the whistle on so-and-so does, certainly, reduce competition in general, and may even improve one's own standing a bit, but is it nearly so compelling a result as "such-and-such has been proven"?

Maybe I'm just too cynical.

Actually, there are few better ways to piss off your competition in science than to beat them to the punch, so if you come up with some big new result, you can expect to have it gone over with a fine-tooth comb by everyone else who wants you to be wrong (or even just not-quite-bulletproof in your logic) so they can be right. And this isn't even necessarily done to show you've been unethical or fraudulent, just premature in your conclusions so the window of opportunity has a chance to stay open.

Not to mention the peer review process, etc. that's required to get published in the first place. To touch on what someone else in the thread has said, if you were going to ask me to put my money on a bet between, say, the scientific review process and the corporate review process in terms of openness and resiliency to fraud, I know exactly where my dollar's going.

SiuiS
2014-05-14, 12:57 AM
I can understand that in theory, but I'm not entirely convinced it works quite so smoothly. For one thing, what researcher ever got famous by proving someone else had been unethical? Blowing the whistle on so-and-so does, certainly, reduce competition in general, and may even improve one's own standing a bit, but is it nearly so compelling a result as "such-and-such has been proven"?

Maybe I'm just too cynical.

You're missing the magical subtext. "This is wrong! You're wrong! Ha! You don't get credit for this! (Now to pick up your research where you left off and get the credit myself, sucker!) :smalltongue:

Although realistically I have no idea if that's accurate or just the cynical expression of bitter scientists who don't like the system. *shrug*


Which reminds me; have we ever figured out where this cave is? Or the equivalent; the servers, the corporate office, whatever? Whenever I meet a scientist I'm always like "what? No, you're not a scientist. You maybe do science professionally, but— well, yes, that is what a scientists does, but... Well, okay yeah. Fine."



Not to mention the peer review process, etc. that's required to get published in the first place. To touch on what someone else in the thread has said, if you were going to ask me to put my money on a bet between, say, the scientific review process and the corporate review process in terms of openness and resiliency to fraud, I know exactly where my dollar's going.

Where all my money goes! Someone else's pocket :smallwink:

warty goblin
2014-05-14, 09:28 AM
Which reminds me; have we ever figured out where this cave is? Or the equivalent; the servers, the corporate office, whatever? Whenever I meet a scientist I'm always like "what? No, you're not a scientist. You maybe do science professionally, but— well, yes, that is what a scientists does, but... Well, okay yeah. Fine."


The science cave is like, totally secret and hidden, yo. Which is too bad, because it's awesome in here. We've got the real data on the Kennedy assassination, the full Watergate tapes, copies of all the diplomatic letters the Mars Rover's been sending back from the Martians, you name it. Once we convinced the ecologists that they'd gotten all the data they needed on the effects of releasing maneating tigers in a room full of computer scientists, there's been a lot less blood too. There's a free bar too, but it's always surrounded by social scientists, so it's bloody impossible to take a drink without filling out about a dozen different 'random' surveys.

Eldan
2014-05-14, 09:35 AM
TOnce we convinced the ecologists that they'd gotten all the data they needed on the effects of releasing maneating tigers in a room full of computer scientists, there's been a lot less blood too.

That's true. We now study the the population effects of releasing Africanized bees in the chem lab.

Also, we prefer the term "Science Lair".

TuggyNE
2014-05-16, 04:58 AM
Update: A good example of science working as intended (http://www.businessinsider.com/gluten-sensitivity-and-study-replication-2014-5)!


Actually, there are few better ways to piss off your competition in science than to beat them to the punch, so if you come up with some big new result, you can expect to have it gone over with a fine-tooth comb by everyone else who wants you to be wrong (or even just not-quite-bulletproof in your logic) so they can be right. And this isn't even necessarily done to show you've been unethical or fraudulent, just premature in your conclusions so the window of opportunity has a chance to stay open.

Hmm. I guess I can see that.


Not to mention the peer review process, etc. that's required to get published in the first place. To touch on what someone else in the thread has said, if you were going to ask me to put my money on a bet between, say, the scientific review process and the corporate review process in terms of openness and resiliency to fraud, I know exactly where my dollar's going.

The glowing accolades, they overwhelm! :smalltongue: (I don't think much of corporate reviews either.)