AI Vibe Check - GAI art versus LLM writing

Computer-Generated Poop Ahead

Content warning for the sensitive: the full version of the term for which "BS" is the common abbreviation will be used extensively throughout this piece.  I will particularly be referencing this scholarly definition of it as contained in this paper.  Note, the paper also makes extensive use of the uncensored term, and is as a result one of the more profane professional papers I can recall reading.  But it's worth reading.

In any case, expect some crude but professionally defined language after the blank lines.  If you'd rather not deal with that, the executive summary is that I do find a qualitative difference between using generative models ("AI") for writing and for art, although in neither case do I consider the outcome creative unless it has undergone significant transformation or curation by a human.

(Cuts don't work here like they do other places, so here's some blank spaces if you regret clicking on the link and would rather not read crude language.)











One of the big pieces of news at the start of September 2024 was that NaNoWriMo was not only sponsored by a LLM (Large Language Model) company, not only encouraging people to use LLMs in pursuit of their wordcount, but they had a statement to the effect that, well, here it is:

(Text of the image above is reproduced below.  Text is from the NaNoWriMo site, although it might have changed by the time you read this thanks to public backlash.)

"We also want to be clear in our belief that the categorical condemnation of Artificial Intelligence has classist and ableist undertones, and that questions around the use of AI tie to questions around privilege."

This is, of course, a clear case of trying to use Lefty Guilt against itself, a definite case of (probably) human-generated bullshit.

But on to the pith of the matters.  I have a lot of friends who are writers, a lot of friends who are artists, and as an educator I am under pressure to consider ways to use LLMs ethically...should it be possible.  So I'm bombarded with a mix of "They're coming for our jobs," "AI is theft," "It's gonna happen so we need to be ready for it," and of course, "AI is wonderful and will make your job so much better!"  While I'm getting a lot of one-sided views (with a notable exception), I'm getting multiple sides from multiple sources.

What's my gut feeling?

Generative AI (GAI) art is a lot closer to being a valid means of expression than Large Language Model (LLM) writing as both currently stand.  Also, I doubt it'll make my teaching job easier or much harder, or endanger it.  That last bit isn't the point of this essay, though.

Before I explain this, I should get some of the basic anti-AI arguments out of the way.  Not that I expect anyone to be convinced by this, I just want you all to know where I stand, and why that stance isn't "burn it all down" as I see so often from some quarters.  If you don't care, skip down to the picture of the kitten.

To address the "AI is theft" claim first, all current models whether art or writing are just correlation engines of increasing sophistication.  Training data is no more theft than using a dictionary is plagiarism.  The use to which "spicy autocomplete" is put, that can be theft.  Theft doesn't happen at the training, or even at the generation of the piece, it happens when some human or "not human but legally a person you can't put in jail" corporation decides to use the piece as a way to not pay a human creator for work that they could have been hired to do, or that substantively resembles their existing work.  

I liken this to what Jim Baen used to say about book piracy: if they weren't going to buy it in the first place, pirating it doesn't cost any sales...the actual customers will pay for the work rather than go through pirates.  And the success of the Baen Free Library, Baen CDs, and Dahak's Orbit proved him sufficiently right that the company kept it up.

Similarly, someone telling Midjourney to make them a piece in the style of Jim Lee doesn't take any money from Jim Lee if they were never going to commission him anyway.  But if, for example, Disney decides to use Generative AI to create new Jim-Lee-like cover art instead of hiring the man, then it becomes unethical and theft IMO (even if they have plenty of signed contracts saying they can do it).

That, by the by, is the real danger of theft with GAI and LLMs.  Not Timmy Fanboy asking ChatGPT to write him a story about Megatron or asking Midjourney to draw Deadpool in Mike Allred's style.  Those are just the same fanfic and fan art, but faster.  It might hurt people who sell unauthorized style-copy pieces at conventions, but not the original creators.  What's going to cause problems is when the big IP farms like Disney or Warner Brothers have decided they have a good enough program to churn out art and stories without paying anyone.  And any laws that might get passed regarding "training data is theft" will only strengthen their positions, because they will own all the IP they feed into their models, while also being able to crack down on anyone who draws one of their characters from reference.

To sum up, I don't find the use of GAI and LLM systems to be inherently theft or unethical, but I can trust big companies to use them that way, and I'm pretty sure a lot of the panic over them is just helping them get laws passed that favor their planned abuses.

This royalty-free kitten is confused by AI.


Now to the meat of this essay, and the reason I'm taking time away from playing computer games to put together all of this rather than making some terse BlueSky posts or something.

As noted in the linked paper at the top, all of our "AI" models are bullshit machines, or BMs.  BMs have a "reckless disregard for facts" at best, and their users seek to deceive the reader or viewer at worst.  BMs know what stuff goes next to what other stuff, and can shape things on a larger scale with improved training data and algorithms.

BMs cannot:
  • Fact-check (everyone knows this one)
  • Maintain significant large scale coherence (like, plots)
  • Revise a work by repeating it with changes (they just make a whole new one)
  • Evaluate in any meaningful way
What they have been getting better at, though, is creating a desired vibe.  It can create art in the style of any number of artists.  It can make music in any genre you ask, even copying some specific composers.  It can even write in the style of some authors, lately.  

Of course it can copy style, these programs are trying to model how humans learn things, and anyone who's seriously studied any creative activity has probably been advised to (or told to) copy the style of others as practice.  Paint in the style or Ruebens, write a short story in the style of Hemingway, write a song in the style of a 1930s big movie musical, etc.  Does it actually do this the way our brains work?  Probably not, but we don't really know how we do it either.  It's only a model, as Monty Python noted.

If you want to do nothing but put in a prompt and see what happens, the outcome will probably feel about right these days, whether it's visual art or music or writing.  But the writing, especially in longer form, is going to be the most obviously "off" somehow.  It will cite fictitious sources, it will have a plot that jumps around worse than one of those "everyone writes a line and passes it on" party games, at the very best it will need a lot of editing for content even if it's supposed to be fiction.

Meanwhile, now that things like the number of fingers are largely problems of the past, a "push the button and see what comes out" piece of art will probably be...okay.  Maybe not what you wanted, but if your criteria are loose enough, it'll probably do.  Same with the music.  If you just want something to represent an NPC in your TTRPG, or some royalty-free background music for your TikTok or YouTube video, it'll be a lot closer to usable than an essay or story will be.

However, if you're pickier, you can't tell the program "Please do the same thing, but from thirty degrees to the right and under lower lighting."  Or "Please write the same music but in a major key rather than minor."  It will, as noted above, just make an entirely new thing and maybe incorporate some of your preferences but mess up something else.

Now comes the curation step, which is easier with art than the other kinds of output, because our brains are good at quickly scanning images and picking what looks good and what doesn't...writing and music require more time to process, and this can get prohibitive for longer pieces.

But regardless of the output, you can generally take the ones you like and add them to the prompts to try to refine the next round of "gens."  This can get you closer to what you want, but unless your wants are pretty common it might still take a long time to get acceptably close.

Here's where the human goes past curation and into actually participating in the art.  Pushing the button isn't enough.  Crafting the prompt is almost never enough.  Gotta go and edit that stuff.

Tumblr user DeepDreamNights (Trent Troop) has a lot to say about this process.  Not only does he pick the best of multiple gens, he splices things together, adjusts colors and positioning, and even at his quick-and-dirtiest puts in at least as much work as an action figure photography comic.  It's composition and layout and taking raw materials to make a complete piece.  Here's where continuity comes in, because the guy's shirt might be red in one gen and purple in another, but he needs it to be blue all the time so blue it becomes.  That sort of thing.

Thus, while it might still be faster than drawing everything by hand or even using something like Poser or Gary's Mod (yes, dating myself with that one) to create art, it still takes a lot of work to get something good out of the GAI.

To circle back, regardless of how much or little effort goes into GAI art, it will at least hit the desired vibe if the prompt is even halfway competent.  Bullshit is all about a vibe.

Unless the desired vibe includes "incoherent mess," though, an LLM-written story is not gonna get to a good output no matter how much curation you do.  The longer the piece, the worse it gets.  Want continuity?  Forget about it, each story is its own thing...best you could do is put a bunch of preconditions into the prompt and hope the pieces sorta fit together.  Unless LLMs get a lot better than I think is possible for the core assumptions, you're only ever going to get things that look okay until you look too closely.  The six-fingered hand equivalents might not jump out as much, especially if it's bullshitting about facts not known to the user of the LLM.  (Aside: longer form stories are hard for BMs in general, you can't get a GAI comic or movie to work without a lot of back-end editing by humans.)

This is not to say an LLM is useless to a writer, just that it's going to need so much work at the backend that it'd be quicker to write your own piece than try to fix the LLM output.  Using the LLM in the same way "plot cards" or "creative whack packs" are meant to be used, as an inspiration and starting point, that could be useful to a writer.  Ask it for a story synopsis, then use it as a prompt for human writing.

An LLM might also be able to write a decent poem, in the same way "cut up" methods can...mostly random gibberish trash, but sometimes it'll be kinda neat.  Instead of cutting up a few pages of a book and scrambling the words, it cuts up thousands or millions of books, although you might need to tell it to avoid making coherent sentences if you want the true cut-up effect.  That sort of poetry is closer to visual art (particularly Dadaist visual art) in any case.

Outside of some very limited and short-form pieces, though, I don't think LLMs are in a place where they can create stories with the same quality and requiring the same effort as GAIs can help make art.  The demands of the different media are too divergent, and GAIs are better at bullshitting plausible visual art than LLMs are at bullshitting coherent stories or factual information.

"Push button, make art/story" is certainly the sort of thing the oligarchs have wanted for a long time, and absent some human interaction at the back end I don't consider simple GAI or LLM output to be valid artistic expression, no matter how detailed the prompt was.  Good prompting is a skill, yes, but it's the skill of commissioning a piece, not of making one.

I certainly don't think a 50,000 word "novel" written entirely by an LLM would be even remotely readable, but I've seen some pretty impressive GAI art, even raw output with no curation.  Artists like Trent Troop have demonstrated that with some talent and work the GAI output can be made into Good Art, but saying you can take that 50,000 word LLM piece and edit it into readable shape without simply rewriting it from scratch?  That's bullshit right there.

Dvandom, aka Dave Van Domelen, is an Associate Professor of Physical Science at Amarillo College, maintainer of one of the two longest-running Transformers fansites in existence (neither he nor Ben Yee is entirely sure who was first), has to deal with enough human-generated bullshit writing, is an occasional science advisor in fiction, and part of the development team for the upcoming City of Titans MMO.





AI Vibe Check - GAI art versus LLM writing AI Vibe Check - GAI art versus LLM writing Reviewed by Dvandom on Monday, September 02, 2024 Rating: 5
Powered by Blogger.