Wednesday, 12 July 2017

Want to help on Jaka's Story?

Sean Michael Robinson:

Edit on 7/13/2017--

Thank you thank you thank you to David Roel for pointing me to a much better OCR solution than the two I was using! The Jaka's Story text formatting presented major problems for both Vuescan's OCR and Adobe Creative Suite's OCR techniques, but was handled easily by the fantastic OnlineOCR.net. A few minutes figuring out their interface, and $7.99 later, and I was in business, with only light amounts of adjustment to be done on the outputted text.

For anyone looking for an OCR solution, you might want to start there.

Leaving the rest of this post here, but I'm no longer looking for help on this. Thanks again David!

p.s. The complete Jaka's Story text, including the introduction, is just shy of 30,000 words. Just FYI!

previously, I wrote....

Well hello again! Fancy meeting you here!

Are you a fan of Jaka's Story(or Cerebus in general)?

Do you know how to do very basic text editing and formatting?

Do you have a few hours of time to spare sometime in the next two weeks?

I'm looking volunteer(s?) who might be interested in helping out the restoration effort by doing some basic formatting on the Jaka's Story text, prior to that text being passed on to Jeff Seiler for copy editing.

I've scanned the entirety of the text, done the initial OCR (optical character recognition) work, and used various find-and-replace tricks to get rid of as much of the text weirdness as possible. But owing to the various text formatting choices in the original book, the size and amount of fill-in on the text, and other oddities, there's still some formatting to be done, namely, adding in paragraph/line breaks and deleting various junk characters added by the OCR. Based on my formatting work so far, I'd estimate there's somewhere between five and eight hours of work to do on this.

Here are two screenshots, so you know what you might be getting yourself into:


The top one is the actual formatted text, with a single line break to indicate a paragraph break, and the additional line spacings resulting from the funky formatting of the original text removed. The bottom is how the majority of the text stands as of now.

I would love the help, and would be happy to add you to the credits for the finished book.

ALTERNATELY-- are you an OCR expert who believes you could transform my scans into a product much closer to the desired result? Please, contact me, either here in the comments, or by email: cerebusarthunt at gmail.

Thanks so much for your time everyone!

5 comments:

David Roel said...

I use onlineocr.net. Results are great.

David Roel said...

Here's a quick page, using onlineocr.net, completely at random (Cerebus 121, page 8).

IT WOULD BE inaccurate to say that she heard a voice; for the communication (if such it was) came from somewhere within her unrelated to her auditory sense; it took the form of a kind of profound (profane?) ripple of self-confidence. A concentric sensation rising out of an unknown (and unknowable) source. It was soothing(this voice) bringing with it subtle power and an unaccustomed calm so that even the ever-present moisture on her palms and soles of her feet evaporated in its wake.

Let go.

It would be equally inaccurate to say that she obeyed the 'voice' without hesitation, for hesitation there was; but, it was a hesitation as fleeting as it was free of anxiety. It would be more accurate (though by no stretch of the imagination precise) to state that the fingers of Jaka's right hand were more fully-attuned and responsive to her new awareness than was her own conscious mind and that they had (in a manner of speaking) lost patience with their Mistress' indecisiveness. Suffice to say (irrespective of cause), the effect remained. Her fingers straightened of a sudden, almost in the manner of a muscle spasm (though too slowly for that to be a fitting explanation). In the moment that followed, the shadow of her hand detached itself, fingertip by fingertip, from her person and in the gesture of someone drowning, slid smoothly, vertically, gently, down the pitted surface of ancient wood. A second later, it paused, a mere foot or so from the filthy checker'd tiles.

Though it seemed to Jaka that each fibre of her being strained to hear that one, awful, final and reverberating note of impact; though a terrible thrill raced through her even as she contemplated (could virtually hear in that moment) Nurse's bellow of righteousness wounded, morality betrayed; no sound came to the young woman's ear. Whether attributable to the Locked Door's balance being less precarious that it had seemed; whether some arcane device; some lost miracle of hinge manufacture forgotten through the ages, had arrested the door's momentum; or whether some forgiving, compassionate and potent deity had intervened to spare her Nurse's wrath, Jaka did not know.

She stood in that same attitude, her right arm out-stretched for several seconds, uncertain of her next move; in many ways still anticipating the plaster-cracking concussion that never came.

At last, muscle fatigue in her shoulder decided the issue and she lowered her right hand to her side.

David Roel said...

It took some work to get it into shape, but the initial result was a lot better than what it looks like you're getting. No italics, obviously, and I didn't check too closely to see if any of those semi-colons are actually colons. Is Jeff Seiler's copy-editing going to include grammar checking? "...balance being less precarious that it had seemed..." should almost certainly be "than it had seemed". (Jaka's Story has a lot of that in there -- "as was her want" instead of "wont", etc.)

Cerebus Restoration said...

David,

You're a hero. That site works better than both commercial options I've been using. Thank you so much for your contribution! More soon....

Jeff Seiler said...

David--Yes, I check for typos and grammar and punctuation, first and foremost. Secondarily, I read for context and then make suggestions if it seems weird. Dave has taken and used some contextual suggestions, but has left some unchanged.

Sometimes, I wonder whether there will ever be a definitive "final proof" of the master opus. I have the 14 "final proofs" of CIH?, but they also require some corrections. (Sorry, Dave and Sandy...) Most pertinent is the correction of the spelling of Jimmy Buffet...er, Jimmy Buffett's...last name in the BATVARK #1 (!) issue.

Yes, "Jaka's Story" wonts...er, wants...a lot of grammatical, typographical, and sentence structure corrections. I've got the vast majority of them written out by hand on legal paper--the paper is getting brittle by now, since I did it sometime last year--and soon I will receive the reformatted pages from Sean. And, then, I'll do it all over again.

That's gonna be a nice, fat check from Dave, even though I usually cut him a discount because...wait...why do I cut him a discount? That's just bad business. I mean; *his* mistakes, right?

So, why..um...*why* do I cut him a discount?

Oh, yeah. 'Cause I get my name in the book (right, Sean?) and because...well, here's where it gets a little squishy...

I really like the guy. He's a really good guy, despite what you may have heard.

Plus, when I told him what scale is for proofreaders/voiceover actors, he decided to err on the side of paying me the high end.

A really good guy.

When I get the reformatted pages from Sean, I will post a "making of the corrections" update here.

Okay, Jimmy Buffett at Wrigley Field on Saturday night. Carmen says, "Ay caramba! Peace out!"

See y'all on Sunday or Monday.

Pictures may or may not follow.

If they do, protect your retinas.