PDA

View Full Version : Online coding tutorial for R



JeenLeen
2013-05-14, 10:01 AM
Does anyone here know a website that has a decent and free tutorial for R programming?

I've found some reference material on R's website, but what I would really like is something that goes through a few basic projects, particularly taking a file from the computer, some statistical manipulations/summarizing/charting of the data, and how to give output as a .csv. I've found that, when trying to learn a coding language, it helps me a lot to go through a the code before using it in a project of my own.

If anyone knows of anything close, please share.

For reference, here is the Wikipedia article: http://en.wikipedia.org/wiki/R_(programming_language) (http://en.wikipedia.org/wiki/R_(programming_language))

warty goblin
2013-05-14, 10:40 AM
The only way I know to learn R is suffering. Suffering and endless googling.

This (http://cyclismo.org/tutorial/R/) seems to have a reasonable walkthrough for basic stuff though.

If you're just doing data manipulation and graphical analysis, R is fairly straight forwards. I tend to use ggplot2 instead of the built-in functionality for graphical work, but that's mostly a matter of habit and familiarity. It's if you want to do, say, mixed models that R starts to get cumbersome, and if you're doing factorials R really has no business existing in the same universe as SAS.

Finlam
2013-05-14, 10:48 AM
I'd agree with the suffering + Googling bit.

Then there's this (http://cran.r-project.org/doc/manuals/R-intro.html), which will tell you everything you want to know.

For beginners, there's this (http://tryr.codeschool.com/levels/1/challenges/1), which is corny, but interactive and potentially useful.

Finally, there's this (http://ww2.coastal.edu/kingw/statistics/R-tutorials/), it's less pretty, but informative and to the point on whatever you want to use R for.

Hopefully one of these will help, though it will probably be as a few drops of water on a burning man. Good luck!

Serpentine
2013-05-15, 10:38 AM
Oh gawd. R. Run away! Run away while you still can!
I guess I might still have some of my workbooks on it from STAT100, if I didn't burn them after finishing it... Probably won't help you much, though, cuz I'm not gonna copy it all online and probably can't be bothered mailing it to you :smalltongue:

But still. Ugh. R.

warty goblin
2013-05-15, 02:50 PM
Oh gawd. R. Run away! Run away while you still can!
I guess I might still have some of my workbooks on it from STAT100, if I didn't burn them after finishing it... Probably won't help you much, though, cuz I'm not gonna copy it all online and probably can't be bothered mailing it to you :smalltongue:

But still. Ugh. R.

You used R in Stat 100? Somebody in your stat department needs their head X-rayed. Statistics is already agony for the non-mathematically inclined, and painful for plenty of people who are good with math; the last thing the class needs is something as miserably aggravating as R.

Seriously, one of the most painful things about being a graduate student in statistics is the inevitable glob of open-source fanatic R fanboys/girls that infest the department. Every time you turn around they're excitedly blathering about how R can now half-assedly do something incredibly basic unless you sneeze at the wrong moment. Never mind that the documentation is written by the module authors, for the module authors, and the error messages are essentially exercises in cryptography.

Eldan
2013-05-15, 04:06 PM
I'm not exactly sure what the 100 in Stats 100 means, but we did use it in our first statistics course as well. Along with SPSS, sure, but we still had it. And an extra lecture called "Use of R" in our third year.

It's a horrible program, but you need it.

Seriously. Every interview I've ever had, one of the first questions was "How good are you in are". And I'm a biologist.

Razanir
2013-05-15, 10:50 PM
Oh gawd. R. Run away! Run away while you still can!
I guess I might still have some of my workbooks on it from STAT100, if I didn't burn them after finishing it... Probably won't help you much, though, cuz I'm not gonna copy it all online and probably can't be bothered mailing it to you :smalltongue:

But still. Ugh. R.

Why? From my experience, it's actually a decent language... well once I got past the hideous <- for equals. Thankfully it also recognizes = as an assignment operator as well.

And to all the people shocked at it being in STAT 100: I also learned it in an intro level stats class. Granted it was a 300-level class, but it was the intro class nonetheless

Serpentine
2013-05-16, 03:32 AM
You used R in Stat 100? Somebody in your stat department needs their head X-rayed. Statistics is already agony for the non-mathematically inclined, and painful for plenty of people who are good with math; the last thing the class needs is something as miserably aggravating as R.It's pretty much all the unit consisted of. It also had a reputation for being ridiculously hard, AND for being repeatedly dumbed down to the point of uselessness.
Why? From my experience, it's actually a decent language... well once I got past the hideous <- for equals. Thankfully it also recognizes = as an assignment operator as well.Because it's convoluted and unintuitive, and as far as I can tell few people who actually use statistics (at least in the biological and ecological sciences) ever use R if they can use pretty much any other statistics program, of which there are many.

Most of what I know (/knew) about statistics, I learnt not from STAT100, but from Ecosystem Management. STAT100 taught me how to use one pretty awful program. Ecosystem Management taught me how to actually do statistics.

warty goblin
2013-05-16, 07:30 AM
Why? From my experience, it's actually a decent language... well once I got past the hideous <- for equals. Thankfully it also recognizes = as an assignment operator as well.

The language itself is mostly fine. I wouldn't mind if it was more strongly object oriented, and a few other things, but generally the language itself is fine.

The way it implements a lot of common statistical methods however is entirely dumb. The only way I know to do contrasts in R is to hand-code matrix algebra for instance. SAS just lets you type in the contrast you want. R's handling of mixed models is also very inconvenient compared to SAS; which gives expected mean squares, automatically calculates approximate degrees of freedom for tests, and generally just makes your life easy. Getting the ANOVA/Method of Moments estimates for variances in R is, so far as I can tell, only doable by pulling out pen and paper and working out the expected mean squares yourself. Type III Sums of Squares are also remarkably hard to tease out of R, and it's not like they're a particularly obscure thing.

And the documentation is mostly useless.


It's pretty much all the unit consisted of. It also had a reputation for being ridiculously hard, AND for being repeatedly dumbed down to the point of uselessness.Because it's convoluted and unintuitive, and as far as I can tell few people who actually use statistics (at least in the biological and ecological sciences) ever use R if they can use pretty much any other statistics program, of which there are many.

Yep, sounds like Stat 100, except made more obnoxious by R.

JeenLeen
2013-05-16, 10:41 AM
Thank you for the help and resources. I'm already finding them quite useful.
Some staff in my office uses SAS, but as it's costly, I'm relegated to R.




For beginners, there's this (http://tryr.codeschool.com/levels/1/challenges/1), which is corny, but interactive and potentially useful.



This link seems not to work. When I go to the base (http://tryr.codeschool.com/), that works, but clicking the links doesn't. Do you know if I need any special settings or something, or do you reckon that the site is just down?

Also, can anyone tell me a good tutorial for uploading a fixed-width file (in .txt format), one that's not delimited by commas or anything else? I found the basic upload code and its description, but that means very little to me.
My co-worker who uses SAS uses a line of code to tell SAS where everything is in the file (starting column, ending column, whether it's a character or number, and the name of the variable). Can someone point me to similar code that'll work in R?
(I have 200+ variables, but once I get the format, I can use a formula in Excel to put the variables in the correct format.)
Sample:
Here is a sample of what is used in SAS.

data &dsname;
infile rawdata lrecl=3300 truncover ;
input
@1 (FieldName1)($4.)
@5 (FieldName2)($7.)
@12 (FieldName3)($40.)
@52 (FieldName4)($40.)
@92 (FieldName5)($1.)
@93 (FieldName6)($1.)

warty goblin
2013-05-16, 09:16 PM
Also, can anyone tell me a good tutorial for uploading a fixed-width file (in .txt format), one that's not delimited by commas or anything else? I found the basic upload code and its description, but that means very little to me.
My co-worker who uses SAS uses a line of code to tell SAS where everything is in the file (starting column, ending column, whether it's a character or number, and the name of the variable). Can someone point me to similar code that'll work in R?

Not delineated by anything? Not even tabs or spaces?

If it is broken on tab or space, read.table() is your friend, and should work just fine. Try read.table('filelocation', header = T).

If it really is fixed width, a bit of googling reveals this (http://www.astrostatistics.psu.edu/su07/R/html/utils/html/read.fwf.html). I've not used it, but it looks fairly straightforwards.

One piece of general advice, which I find particularly useful in R; if you're trying to do something big, make a very small fake version of the problem first, small enough you can hand verify it's doing what you want and not something stupid. Because of the actually generally quite good way R handles vectors, there's a lot of things that are much easier than you'd think, but unless you can manually verify R's not being stupid it's hard to get a sense for how powerful they really are.

JeenLeen
2013-05-17, 08:55 AM
Not delineated by anything? Not even tabs or spaces?

If it is broken on tab or space, read.table() is your friend, and should work just fine. Try read.table('filelocation', header = T).

If it really is fixed width, a bit of googling reveals this (http://www.astrostatistics.psu.edu/su07/R/html/utils/html/read.fwf.html). I've not used it, but it looks fairly straightforwards.
Correct; truly no delimiters. I'm not sure why they decided upon that for format, but such was the decision.

So to translate what I had from SAS into R, would it be something like
read.fwf(FileLocation&NameGoesHere, widths =c(1,5,12,52,92,93), header = FALSE, sep = "\t", as.is = FALSE, skip = 0, row.names, col.names=c(FieldName1, FieldName2, FieldName3, FieldName4, FieldName5, FieldName6, n = -1, buffersize = 2000)

#EDIT: I realized I put the starting column position in the widths. What should actually go there is the length of the field (i.e., FieldName1 takes up characters 1-4, so it's width is 4.) That correct?

#the file itself does not have the column names in it, but I know what the column names should be from the data file layout



One piece of general advice, which I find particularly useful in R; if you're trying to do something big, make a very small fake version of the problem first, small enough you can hand verify it's doing what you want and not something stupid. Because of the actually generally quite good way R handles vectors, there's a lot of things that are much easier than you'd think, but unless you can manually verify R's not being stupid it's hard to get a sense for how powerful they really are.
Thank you. Yeah, before I try anything for real with our big data files, that is definitely something I should do.