PDA

View Full Version : Anyone want to do some work for me?



Rawhide
2010-09-11, 10:35 AM
Unpaid work, that is.

There is a really difficult to read document (due to the layout) in PDF format. It has been scanned and OCRed, the original scan is shown as an image on the top layer and the OCRed text is hidden underneath. This text can be selected and copied.

I need to read this document and can't as it is, it makes my eyes hurt. In order to read it, I need the text copied and pasted into a txt document (plain text only), single column (of course, plain text can't do columns), and then have all the extra line breaks removed, re-paragraphed and all that. It's the re-paragraphing that will take the longest.

From that I'll put it into a Word document and reformat it, but I only want to receive a plain text document (.txt).

Anyone interested in helping me out? I'll understand perfectly if nobody even considers raising a finger.

The original is here:
http://theory.stanford.edu/~ninghui/courses/Fall03/papers/clark_wilson.pdf
Warning: Contains fully justified, monospaced text in columns. Will make your eyes bleed.

Glyphic
2010-09-11, 10:42 AM
I've been meaning to work on my typing skills, but I'm unsure when I can complete this task. Do you have a deadline?

Rawhide
2010-09-11, 10:50 AM
I've been meaning to work on my typing skills, but I'm unsure when I can complete this task. Do you have a deadline?

Well, there's thankfully no need to retype it, the text is there to be copy pasted (with some OCR errors). The extra line breaks can be removed with a find and replace easily enough, but that removes all line breaks. Hence the need to re-paragraph it. I was starting to do that myself, but I now really need to get some sleep.

It's for my study and if no one has completed it by the time I wake up, I'll get back into doing it myself. Not that I really have the time to be doing that, but I can't expect anyone else to have it either or to put that much effort into a favour. If someone has the time and inclination, it would be very much appreciated, but otherwise it's something I'll have to do myself.

SensFan
2010-09-11, 10:50 AM
Assuming I understand what you want done, it shouldn't take too long. I can start working on it.

EDIT:
Yeah, this is going fine. I can finish up quickly enough. How do you want the text sent to you once I finish?

SensFan
2010-09-11, 11:39 AM
Double-posting so that Rawhide notices the thread.

Done. Just let me know how you want me to send you the file.

Trog
2010-09-11, 01:57 PM
Er... I'm pretty sure Acrobat reader will allow you to Save As Text, actually. Which might solve most of your problem right there.

SensFan
2010-09-11, 05:44 PM
I tried that first, Trog. Murders the redability of the text even more than a straight copy-paste.

Trog
2010-09-11, 05:54 PM
I tried that first, Trog. Murders the redability of the text even more than a straight copy-paste.
Ah. Bummer. Well, worth a shot, anyway. *shrug*

Zeb The Troll
2010-09-11, 06:03 PM
Double-posting so that Rawhide notices the thread.

Done. Just let me know how you want me to send you the file.If he hasn't responded to you yet, try sending him a PM.

Vaynor
2010-09-11, 06:13 PM
He probably didn't expect it to be done so fast. :smalltongue:

Coidzor
2010-09-11, 06:18 PM
Also, he was going to hit the hay when he posted this thread, it is possible he hasn't gotten back online since when he took his nap/slept for the night.

SensFan
2010-09-11, 06:35 PM
Also, he was going to hit the hay when he posted this thread, it is possible he hasn't gotten back online since when he took his nap/slept for the night.
This is my thought.

Rawhide
2010-09-11, 07:49 PM
I'm awake and online now. (And I would like to thank you very much!)

Do you have any place you can upload for me to download? If not, I'll PM you an email address (to avoid spam harvesting).

Rawhide
2010-09-11, 09:11 PM
Thank you very much, text file received. There is still a lot of OCR fixing I'll need to do (something I specifically did not ask nor want you to do), but you have helped me out greatly!

SensFan
2010-09-11, 09:39 PM
Thank you very much, text file received. There is still a lot of OCR fixing I'll need to do (something I specifically did not ask nor want you to do), but you have helped me out greatly!
Glad I could help. There may be a few mistakes of missing characters or (more likely) spaces throughout the file. I couldn't figure out how to use Replace to remove the linebreaks, so did it with "Backspace, space, repeat". It's likely that I messed up at some point.

Hope there isn't too much work for you still to do.

Rawhide
2010-09-11, 09:45 PM
I couldn't figure out how to use Replace to remove the linebreaks, so did it with "Backspace, space, repeat".

Eep. That would have been a fair bit of work there...

With Word, you use ^p to find (or replace text with) a paragraph mark. You can also click More then Special to insert all sorts of special characters. (Don't confuse a paragraph mark with a paragraph character. The latter is just the visible symbol.)

Rawhide
2010-09-11, 11:59 PM
Finished proofing it (for OCR errors and an occasional extra linebreak), reformatting it (including resorting the document so that all the words fall on the same page numbers) and reinserting the picture.

Thank you very much, this has been most helpful.