Wednesday 16 March 2016

READS TYPE

DAVE SIM:
Just a head's up to Jeff S that the text-as-word-document for READS -- and your returned scanned corrections -- should be on the way to you from Sandeep shortly.

I'm going back and forth on the idea of getting Sean to do Notes at the back.  What would he have to say about THE MOST CONTROVERSIAL GRAPHIC NOVEL EVER PUBLISHED!

The fact that that technology exists -- that you can scan text and convert it into a word document -- is pretty amazing.

Is there any chance of getting some volunteers to scan ALL of the text from the back of each issue using that technology?  It seems to me that would be a big leap forward for research capability.  You could do as wide or as narrow a search as you wanted and have an exhaustive list of references to that subject.

The more volunteers, obviously, the better, since you're talking about a mammoth pile of scanning.  Agree ahead of time what issues each volunteer is doing and then pool the results.

9 comments:

Dave Kopperman said...

OCR (optical character recognition) has existed for the better part of a quarter century, so Glinda's going to say something about always having been able to scan to text.

At any rate: yes. I do volunteer myself for a batch (let's say 10 issues - my personal collection starts with #81).

Jeff Seiler said...

I think the first question would be: Would such back-of-the-book scanning have to be done with a Sean-level scanner, or would any old dime-store scanner do, as long as it turned out to be legible?

Jeff Seiler said...

I assume Dave is talking about the Aardvark Comment pages and the essays. But aren't all of the essays available at cerebusfangirl.com, and, thus, already digitized?

Of course, I think a lot of typos occurred in the course of transcription, so I guess scanning them would allow for corrections. Right? Or could those digitized files be downloaded directly from cerebusfangirl.com and then corrected?

Jeff Seiler said...

Altso, Dave, I'm looking forward to getting back to work on Reads, right after this weekend's St. MinneSomeplace in Paradise Parrothead Club's (say that three times fast!) annual charity fundraising and live trop-rock event is over. I should be able to git 'er done in less than a week, God willing.

Jeff Seiler said...

Oops, looking back, what I meant was, if the essays archived at cerebusfangirl.com have transcription typos (which invariably happens--no perjoratives here), then scanning directly from the issues would reduce those and allow for quick fixes of original typos. Right?

Jeff Seiler said...

BTW, one of the charity fundraising gift baskets that I am putting together for this weekend's event will contain my extra copy of the remastered Church & State volume I, along with most of the run of Cerebus Bi-weeklies from issue #1 up to issue #52. Tim, at the College of Comic Book Knowledge, here in Minneapolis, gave me a 15% discount on the bi-weeklies--85 cents per copy.

Another basket will have four issues of Mouse Guard, the first book of Owly, and the first two Scholastic color reprint volumes of Bone. Tim gave me the same discount on those.

Go, independent comics!

Let's hope we get more young fans interested in the indies! Or, semi-indies.

Kit said...

scanning directly from the issues would reduce those and allow for quick fixes of original typos. Right?

OCR is less reliable than transcription.

Jeff Seiler said...

Yeah, Kit, (and thanks for chiming in), I agree.

The trench work of getting each word and every phrase, every inflection, just right, is much harder.

But, typing is, well,

Worth it.

I do it, at length, and it ends up being a nice night off from the regular routine.

The inflection?

Well;

Ask Dave.

Dave Kopperman said...

Some (not all) OCR programs can learn to recognize certain characters, so while the initial scans may need some serious poring over, the fact that AV Comment and all the text pieces up front use largely the same font (face AND size) for the run of the book would make it as close to an automated process as possible.