Wednesday, 14 December 2016

Cerebus Volume One: the Original Artwork: part 1 of 3

Sean Michael Robinson:


First thanks so much to all of the backers of Cerebus Archive Number Six, for ensuring that this restoration and print project continues its lumber forward.

Good news on several fronts this week. At long (loooong!) last, Cerebus Volume One is headed to the printers. For this book (and possibly for some other books in the future) we'll be printing with Friesens, a Manitoba printer who put together a really great looking press test a few week's back. I'm excited to see how the final product looks.

True to form, I continued to pick at the book even as it went out the door, officially making it more than a year in the making, from first invoice and initial work to delivery to the printer. It's an interesting thing. Although it's by far the most modest book visually, it's also the fastest seller, and it's the book that has benefited the most from the restoration effort, as dictated by the condition of the materials.

Back at the end of September I wrote at great length about restoring the bulk of the pages from a combination of newsprint and the seven issues worth of sub-optimal scans we had of the actual original negative. But I didn't get very much into the original artwork we've acquired, through the Art Dragnet, and how it's impacted the book.

I'll save myself some time here and quote from myself, from the 6,000 word essay at the back of the new book:

Of the 538 pages of art in the book, 97 pages are now sourced from direct scans of the original art boards. Most of the artwork was sold in the back of the monthly book itself, or at various conventions, but a few pages were retained by Sim over the years. Dozens more were graciously scanned by their owners, both current and former, and submitted to us via email. (Thank you, owners!) And many were scanned by auction houses, sometimes to advertise current auctions, and occasionally upon request. (Thank you, Comic Link and Comic Connect!)

When it comes to detail and pen and ink textures, the original artwork is in a class by itself. Unfortunately, the mechanical tones used to create Cerebus’ fur and a variety of background and textural effects, have shrunken over time, to a fraction of their former size. Thus every page sourced from original art has had to be restored by digitally copying portions of the missing tone and flying it in to the missing areas. But the labor involved is more than offset by the improvement in image quality. All of these things taken together mean one thing — this is the single best version of this material under one cover, and as more original artwork continues to come to light, future printings will only improve.

Seventeen of these pages were pages still owned by Dave, and were scanned by Sandeep for this printing. Dozens more were graciously scanned and digitally sent to us by their owners, current or previous, or in many cases, by auction houses, and even by people who had access to large art collections for the purposes of scanning other materials for other print projects and snagged some Cerebus pages while they were at it! (If this all sounds vague to you, suffice it to say that not everyone who sent us a page was the owner of said page, and not everyone wants to be thanked by name, in print or otherwise. Plenty of anonymous sources in several forms, which I do my best to respect.)

SO! With that out of the way, here's a list of all the pages sourced from original artwork, brand new to this printing —

Issue 2 pg 1 
Issue 2 p 7
Issue 2 p 12
Issue 5 p 7
Issue 5 p 8
Issue 5 p 9
Issue 7 p 1
Issue 7 pg 6
Issue 7 p 16
Issue 7 p 19
Issue 7 pg 22
Issue 8 pg 18
Issue 8 p 21
Issue 9 pg 8
Issue 9 p 13
Issue 9 p 14
Issue 10 p 1
Issue 10 p 10
Issue 10 p 11
Issue 10 p 15
Issue 11 p 11
Issue 11 p 15
Issue 11 p 17
Issue 11 p 21
Issue 12 p 16
Issue 13 p 3
Issue 13 p 6
Issue 13 p 7
Issue 14 p 8
Issue 15 p 1
Issue 16 p 4
Issue 16 p 5
Issue 16 p 17
Issue 16 p 18
Issue 17 p 2
Issue 18 p 1
Issue 18 p 9
Issue 18 p 12
Issue 18 p 18
Issue 18 p 20
Issue 19 p 15
Issue 19 p 18
Issue 19 p 19
Issue 20 p 1
Issue 20 p 5
Issue 20 p 6
Issue 20 p 9
Issue 21 p 12
Issue 21 p 18
Issue 22 p 5
Issue 22 p 6
Issue 22 p 17
Issue 22 p 7
Issue 22 p 8
Issue 22 p 12
Issue 23 p 2
Issue 23 p 11
Issue 23 p 12
Issue 23 p 13
Issue 23 p 17
Issue 23 p 19
Issue 24 p 6
Issue 25 p 5
Issue 25 p 13
Issue 25 p 19

And how much of a difference do these scans actually make?

It really depends on the page — what kinds of techniques Dave was using on that particular page, and what kind of shape the alternative sources are in. (Here's a post breaking down the broad differences between these sources, using pages from Church & State II and Jaka's Story as examples.)

Let's start at the back, shall we?

I'm (maybe irrationally) exited every time we get new original artwork emailed to us, but I'm especially excited when it comes from periods of the book when the reproduction was subpar. Issues 23 and 24 are good examples. I have yet to see really good looking copies of the issues, either the original printings, or reprints. The book had just switched printers at the time, from Fairway Press to "Odyssey Press and Southern Dutchess News," and from the looks of it, they used a much pulpier paper than the other press, and were less experienced (or just less skilled) in shooting negatives. This is unfortunate, as Dave had at this time really started to incorporate his horror influences into his inking, to great effect (in the originals, anyway). By the time these negatives were replaced by second-generation dupes for the printing of the first trade collections, many of these textures had filled in completely.

Here's a look at the original art for page two of issue 23. There's a real Jeffrey Jones influence here, it seems to me. You can see from the original how the tree was fleshed out with some tapered pen lines after the initial brushwork. And the drybrush China white (i.e. zinc oxide) across the tree is a great textural addition, adding some more horizontal movement and implication of weather to the page.

On the bottom of the page is an organic texture that Dave had tried already a few times at this point, but would become a much more overt effect later in the book — the fingerprint. It adds a great touch to the leg of the wounded Cerebus, linking the dragged blood on the snow to the rough dead tree, as the only disturbances visible of the white ground.

Another technique characteristic of these issues is the usage of fine, scratchy hatching, both for form and value, and for creating the implication of a bit of texture — wood, land seen at a distance through a snowstorm. This detail is somewhat present in the first printings but almost nonexistent after that.

This page from later in the issue was submitted by multiple scanners. Thanks so much, Alan and Dean!

More next week!


Sean R said...

And hey, if I didn't make it clear in the post... PLEASE! Send us scans of your originals! Or send me an email and we'll work something out. Cerebusarthunt at gmail.

Mike Battaglia said...

Speaking of digitizing Dave Sim content...

Is or has there been any talk about creating a 'super file' that would house every single word Dave has written in a format that would allow for multiple types of searches and deeper interactivity? I'm talking every letter ever written, every essay, note, and whatever else exists. It would be a massive undertaking (the collected letters of 1990 alone is almost 700 pages if I'm not mistaken).

The process would have to start with utilizing OCR technology (Original Character Recognition). I think simply scanning the pages as static images isn't good enough, because of the lack of searching function, navigability and consideration of future applications, etc.

Any thoughts?

Sean R said...

Hey Mike,
I don't know how wide their net is, but Dave Fisher and Dave (of the Sim variety) have been working with an ultra fast document scanner and some OCR software to put together various projects. Good suggestions. There's certainly a ton of material! I just received a bunch of MINDS issues from Dave and Sandeep in the mail and I cannot believe how many long essays, speeches etc are in there that I've never read or seen. Really interesting stuff.

Mike Battaglia said...

Hey Sean,

There aren't very many people whose life work warrants this type of treatment, but Dave is certainly one of them (in my opinion).

I'm pretty sure there are a group of people doing this for Yogi Bhajan (who died in 2004), as there are 60,000 pages of transcribed lectures that he conducted over a 40-year period. I used to work with some of his colleagues and I seem to recall hearing that something like this was being planned. I could contact them and see what they're using, how they're getting it done, if in fact they are getting it done.

Jeff Seiler said...

You know, Mike, your comment, above, reminds me, vaguely, of something that was noted (by Dave?) many years ago. The network of Cerebus fans and readers (not to mention fans and readers [and supporters] of Dave Sim--often, but not always, one and the same) has a vast array of outside interests and contacts.

I remember, one year at S.P.A.C.E, there was a guy doing a documentary about Dave, Cerebus, and their fans. Dave told us later that the documentarian had remarked that Dave's fans seemed to be of above-average intelligence, as comics fans go. It might have been a case of damning with faint praise, but I didn't take it that way.

The fact, Michael, that you are aware of the project for that Yogi, and can casually work it into the ongoing conversation, is a testament to the documentarian's assertion.

Yay us--Cerebus fans and Simians!

Dave Sim said...

Hi Mike & Sean -- I'm always surprised -- and shouldn't be -- by the number of devoted CEREBUS fans -- like Sean -- for whom there's CEREBUS material that they haven't read. MINDS seems really "late" to me. But "late" is a relative term when someone born the day the first issue came out is turning 40 next year.

Dave Fisher is using a high-speed scanner to scan all of the letters but he's pretty casual about getting the job done and I've been pretty casual about getting him in to do it. As it is, between what he and Sandeep are getting paid every week, I haven't been at all sure that I could afford to have him come in any more often than he already is.

But your exchange here raises an interesting question: SHOULD we be doing the correspondence first? My impression was, "Well, okay, no one has seen the letters, so the letters need to be scanned first." That is, "MOST people have seen the letters pages and essays." But, that probably isn't true.

It might be a situation where we need a team of volunteers with scanners who are willing to scan what THEY have. Depending on how many volunteers we can get, it would be a matter of just divvying up the workload -- make sure no one is duplicating anyone else's work -- and then pooling everything into a massive database with a "character recognition software" component.

That's definitely been my goal: to have everything in the Cerebus Archive (letters, clippings, publications) scanned and "word-searchable" for research purposes.

Gauging the level of interest among CEREBUS fans who have large collections of the original comics would probably be a good useful step forward.

Contacting the Yogi Bhajan people might be another useful step forward.

Thanks, Mike!

Travis Pelkie said...

Another cool post.

Although I thought China White was...y'know...the heroin?


While we're talking about B&W greatness, Richard Corben came out with a new book from Dark Horse today, Shadows on the Grave. Just flipped through it, and it (of course) looks great, particularly because it's printed on "newsprint" (quotes because I'm not sure what actually qualifies as newsprint these days). And it smells great, too!


Not to get weird or nothin'.

Mike Battaglia said...

That all sounds like a great idea to me. I've sent out three queries to people in the YB community, and tomorrow I'll call some relevant entities (I'm thinking the Kundalini Research Institute (KRI) would be the best place to start). Will post cumulative findings here. I expect some response-time delays due to general holiday commotion.

Mike Battaglia said...

Quick update:
Already received a response from one Dr. Japa K: all the work has been completed and has culminated in this website:

Please check it out just to see the functionality of it and how well it has been organized, it might be a good template for later down the road.

A quick glance at the "about this site" reveals some great leads. Here's the specific blurb that caught my attention (with potential leads in all caps)--

"At the dawn of the new millenium, technology shifted to allow the vision of an internet-based searchable database become a foreseeable goal. The KUNDALINI RESEARCH INSTITUTE's Board of Directors realized the critical need to preserve the original media in digital format due to natural degradation of original media. KRI began the digitization of video and audio tapes with archival partners SCENE SAVERS and MAHERN ARCHIVAL PRESERVATION in January of 2009. After a thorough pilot project in 2008, work began in earnest, and at the beginning of 2014 was moving toward completion. In 2008, SIRI VED SINGH KHALSA was awarded a lifetime achievement award for his dedication in recording Yogi Bhajan's classes.

KRI worked with DAKOTA SYSTEMS and CONVIVIAL DESIGN to outline the parameters and needs of what would eventually become the The Yogi Bhajan Library of Teachings® Website. The dream of realizing a global teaching tool of the magnitude of The Yogi Bhajan Library of Teachings® continued to take shape. The website design was finally mapped out by AVALON TECHNOLOGY, JOTI SOFTWARE and the database of lectures was compiled to form the searchable database that interfaces with the video and audio recordings.

2013 to Present – Over 6,500 lectures have been transcribed in either first or second form. Hundreds of lectures still remain to be transcribed. As the project of developing a searchable database evolved, Joti Software came on the scene to sculpt and refine the final project. After several months of programming new features and streamlining the administrative capabilities of the website, the Beta Test of The Yogi Bhajan Library of Teachings® rolled out in August 2013. A volunteer team of longtime students and Kundalini Yoga teachers signed on to shake out the "bugs" and see for themselves the culmination of nearly four decades of dedication to preserving the technology shared by Yogi Bhajan." (end excerpt)

Barry Deutsch said...

Dave, for what it's worth, most or all of the letter columns and such - basically, anything that was published in an actual issue of Cerebus - have already been scanned and can be found online.

Of course, the scanning isn't nearly the high quality of the scans for the Cerebus restoration project - but for purposes of OCR software, that may not matter. (Or maybe it does. I really don't know anything about that.)

Barry Deutsch said...

Also, thanks for reminding me that Cerebus #0 exists! I bought one on Ebay, and was reminded that the "Like A Look" issue exists. (That one always cracks me up.)

Mike Battaglia said...

Hi Dave, et al,

I think I’ve acquired all the information there is to acquire on the subject of “how they did it”, in regard to digitizing Yogi Bhajan’s massive output, including labor process/flow and cost.

I had a long conversation this morning with Abhi Raj Singh, the owner of Joti Software (the company that compiled all the data after it had been digitized and created the user interface for the website linked above), and he detailed the proceedings for me:

Since all the lectures were in audio or video format, every single word had to be TRANSCRIBED – we’re talking sixty thousand pages of typing, a massive undertaking that took five years. So there’s your initial process: half a decade of transcribing audio and video (using Microsoft Word). Once they had enough to warrant the next step, they brought all the Word documents to Joti Software. Joti Software then used Mark Logic, a robust platform for translating txt. files into searchable and index-able xml material. (Note: the files need only be clean txt. files, not necessarily Word).

Following that explanation, I asked him what we would have to do to get to the point where we’d be taking that ‘next step’ toward working with a Joti Software or whomever, explaining what we’re working with. No great surprise, he said we’d need to scan everything that hasn’t been scanned, using OCR. He also indicated that OCR can be applied after the scanning nowadays, which was news to me. So if something has already been scanned, it can then be run through OCR, apparently (? I’ll look into that to be certain of it, unless someone else wants to chime in with their expertise on the subject). But the end result has to be clean txt. files, as that is what will be needed to translate into xml, which is what will be used to create the searchable, index-friendly material.

I asked what the hypothetical cost would be for working with Joti Software in this capacity, and he gave me a very rough estimate, for the sake of humoring me (just to be clear, this isn’t a quote of any kind) of $10k. That would be the cost of taking all the txt. files and using it to create a searchable database on par with the one linked above (here it is again just so you don’t have to scroll up Unfortunately, using a beast of a platform like Mark Logic would run an additional 10k annually (ballpark), but Abhi Raj informed me that using Mark Logic wouldn’t be necessary – it’s just that it’s head-and-shoulders above anything else that exists currently.

At any rate, keeping the first thing first: the initial steps of simply getting all the letters, essays, notes, and whatever else exists into clean txt. files is going to be the first step, unless there’s a process that I’m unaware of that would be more efficient.

Mike Battaglia said...


Regarding OCR software that can be employed post-scanning: totally commonplace now. I'm sure Dave Fisher knows more about this than I do, but I pretty sure that anything he has scanned (along with any other existing scans) should already be prime for OCR conversion, and wouldn't need to be scanned again.

Once converted to searchable text files, I would guess that all the documents would have to be edited before deemed 'clean', as the OCR technology, like voice recognition, is not perfect. Again, I'm sure D Fish knows more about this than I do, so please correct me if I'm wrong--but I'm thinking this process will require an additional amount of initial legwork: editing the OCR results to assure they are, word-for-word, in line with the original text.

Which brings me to my next concern:
You have a unique language that you've created in employing capital letters, quotations and so forth for emphasis and contrast (creating a gauge of importance that ranges from all lower case to all upper case, further gradated by quotations), and I wonder how much (if any, hopefully none) would be lost in the OCR process (and would, therefore, have to be recreated).

Dave Sim said...

Barry - That's a good point. We can just "pirate the pirates" who have the entire individual issues scanned and just pull out all of the "back of the book" text pieces, make a digital pool out of them, OCR them and then just incorporate the CORRESPONDENCE as that gets scanned. I think we'd need codes for the digital files, indicating what overall category they were in -- CORRESPONDENCE, AARDVARK COMMENT, FOLLOWING CEREBUS, NEWS CLIPPINGS, FANZINES -- for research "sourcing" purposes, but, at least theoretically at that point, you could just type "Kevin Eastman" (or whatever subject) and instantly get a complete inventory of that subject in the Archive.

Mike - We're at least far ahead on the fact that virtually none of the Cerebus Archive is in audio form. The few speeches I gave, I always printed a complete transcript in the back of the book. There might be a few recording of convention panels out there that would need to be transcribed, but really, not enough to be a major concern. Just the cherry on the sundae.

I appreciate all the time and trouble you've taken with this!

Mike Battaglia said...

Pffft, Dave... it's not any trouble at ALL. Consider me a 'ready, willing and waiting' volunteer. It's an honor. My fanboy antennae were vibrating wildly for the entire duration of the research. I just hope some components of the resulting text walls will yield something helpful.

Sean R said...

Barry and Dave,

Some tests would need to be made to make sure that the "boots" have high enough quality to do OCR with. You don't want to make a bunch of work on the back end just to save scanning time. OCR can be finicky with certain typefaces and different resolutions. Just as a for instance, Dave's typewritten (i.e. with a typewriter) messages sent over fax are gibberish to my OCR, as the low-res combined with an unusual face makes for many many many errors. So one would have to test out the available bootleg sources to make sure you're not just creating a much more error prone document in order to save front-end scanning time.

Great ideas all around! Mike, while it sounds like they found a system that worked best for them, I wonder if a free database system (i.e. a Wiki-type open source system) might work just as well for the purposes we're discussing here? I might make an inquiry re: free or near-free database solutions to Cerebus Restoration Alum Dr. Mara (currently working on database solutions for scientists in North Carolina).

Mike Battaglia said...

Hey Sean,

What are your thoughts about how OCR would handle Dave's unique language nuances that are designed to produce emphasis -- the utilization of all caps, semi-caps, quotations and so forth. Obviously those faxes will have to be manually retyped from top to bottom, but what about the cleaner scanned material? These nuances (if that's the right word) strike me as an important component to Dave's written voice, and I'm wondering how much would have to be manually recreated to preserve it, post OCR.

Sean R said...

Hey Mike,

It depends on the software! OCR is very different from type to type, and definitely making sure that formatting (particuarly caps and italics) are intact would need to be a must for picking a method.

Also, I don't think anything on the typewriter would need to be retyped, just scanned at a higher resolution, which can improve the character recognition (more detail to recognize :) )

Dave Sim said...

I think we also have to take into account that OCR has got to be improving on an on-going basis: first of all at the "High End": picks up everything and interprets close to perfectly, but costs more for that level of accuracy. But, then, (gradually or not gradually) the technology comes down in price and improves in accuracy.

That might be a likely "next step" to consider: what Archives feature a wide spectrum of clarity to the material that they're scanning?

I'm thinking of second- and third-generation carbon copies which are not unusual (pre-1970) as primary typewritten records and can be "challenging" to interpret just by sight, let alone by electronic means. I'm assuming that any Archivists who have a large body of "fuzzy carbon copies" to scan are watching...closely...for that particular advance. OCR that's sharp enough to distinguish letters on a 3rd generation carbon copy would likely have little to no trouble with 98% of the Cerebus Archive (all notable exceptions, like Mary Hemingway's photocopied carbons of her Africa diary, duly noted).

Barry Deutsch said...

So in case anyone is interested, I looked around and found only one pirate comics site that has a complete run of Cerebus, including letter columns. It's located here.

Since these sites come and go, it might be worthwhile for someone who is archival-minded to download all of those pages before this site disappears (a program like HTTrack could be used to download the entire thing).

I tested a page of letter column with i2OCR (an online OCR program that can handle text printed in columns), and it did a pretty good job of translating the text with oddities intact, although it wasn't quite perfect.

Mike Battaglia said...

@ Barry -- oh the irony: ripping off Dave only to inadvertently facilitate the archival process. Or... "How NICE of them to do ALL of that scanning for the cause! Positively selfless!"

I'd be happy to download the whole shebang and isolate whatever pages are needed, then email the results to Mr. Fisher. I'll wait to get permission from Dave (Sim) before taking any steps in that direction.