I’ve been a software engineer for twenty years, and I have experience with artificial intelligence (“AI”) algorithms going back to the mid-2000s, when neural networks, now the dominant paradigm of statistical learning, were considered wildly impractical. I’m also a novelist; I published an ARC (in the hope of not needing an expensive galley campaign) of Farisa’s Crossing to Royal Road in April 2024. I won’t claim to be a world-class expert on artificial intelligence or all of literature, but very few people match my level of experience on both topics, so it would be an understatement to say that I have an informed opinion. Can AI write artistic literature? No. Can it come close? Also no. There is too much raw humanity in that kind of hardcore writing for real achievement to made via statistical mimicry. What about the bestseller—will AI solve that reinforcement learning problem? Probably soon.
To explain why this is the case, let’s get into how language models actually work.
Spelling
Below is a real-world, production typo. How would you correct it?
“Farisa scratched her eblow.”
The software I was using miscorrected it to “Farisa scratched her below.” It accidentally made an ordinary sentence—an action beat, suggesting the character’s mild discomfort—sexually suggestive. So what happened here? Well, the first-order goal of spelling correction is to turn nonwords (misspellings) into adjacent words deemed to be most likely what the writer intended. Of course, “wordness” is not a binary distinction. “Red” is almost everywhere accepted as a word in the English language. “Farisa” is a word because it’s a name, but not in the dictionary. “Spudlicrux” is probably not a word. To be rigorous about this, we assign fluency scores to strings of letters; we can start by assuming each has an absolute probability corresponding to how frequently it is used in fluent (that is, judged to be grammatical) speech or text:
f(Farisa) = 10-7
f(scratched) = 5 * 10-6
f(her) = 2 * 10-3
f(elbow) = 3 * 10-5
f(.) = 5 * 10-2 — we treat punctuation as “words.”
… and also:
f(below) = 5 * 10-5
f(eblow) = 10-9
If we arbitrarily set the cutoff for wordness at an emission probability of 10-8, we recognize that “eblow” is probably a misspelling of “below” because that’s a more common word than “elbow.” So we would make the incorrect correction above.
Context
The words that appear in a sentence are not, of course, statistically independent. We can achieve better corrections by using a word’s neighbors to refine our assessment. If we treat the words as independent emissions, then the probability of “Farisa scratched her elbow.” is 1.5 * 10-21; that’s what we get when we multiply those probabilities together. The real probability is probably higher—more like 10-15. These probabilities are all contextual, both internally—Farisa would not scratch “her automobile” or “her superfluous”—and externally—in a story with a character named Farisa, that probability increases. In other words, the probabilities we care about are conditional and, in practice, relative. We don’t care about the absolute probability for a sentence—the longer it is, the more “improbable” it becomes not because it is not fluent but because there are more possibilities.
We achieve better repair by using context. Perhaps our fluency model finds:
f(“Farisa scratched her elbow.”) = 10-15.
f(“Farisa scratched her below.”) = 10-18.
f(“Farisa scratched her eblow.”) = 10-19.
We thus would make the most useful correction.
Context is also important to handle nuance. Words often have multiple meanings with little or nothing to do with each other. A classic example is “bank.” If the word occurs in proximity to “money” we are probably talking about a financial institution; if it occurs in proximity to “river” we are probably talking about the place where land and water meet. We therefore can’t understand a sentence based on its words in isolation; neighboring words matter.
Knowledge
Fluency scores can encode real knowledge. Continuing with the example above:
f(“Farisa scratched her elbow.”) = 10-15.
f(“Farisa scratched his elbow.”) = 10-18.
A fluency model therefore “knows” that “Farisa” is a female name. It also knows, after scanning millions of words and finding that “orange” is more commonly found in proximity to “red” and “yellow” than to “blue”, even though it, being an algorithm and not a person, has never experienced color.
There’s a dark side to this. We can investigate the intermediate encodings (or embeddings) of words that neural networks use and we’ll often find knowledge it has discovered about which words, based on the text it has ingested, appear to be most similar. The model can learn that:
bank(1) = “bank” + financial
bank(2) = “bank” + natural feature
These embeddings represent words as vectors, or ordered lists of numbers that can be added to make useful combinations. For example, a model will often find that:
king = royalty + male
queen = royalty + female
man = adult + male
woman = adult + female
king - man + woman = queen.
This is never explicitly coded into the model—there are millions of analogical tetrads these models learn—but inferred from millions, billions, or (for a modern foundational model) trillions of words.
The bad news is that, because all this text was generated by people, and some people have shitty biases, there’s a risk of models also developing those biases. A model might “learn” that “doctor - man + woman = nurse”, which is both incorrect and offensive.
There are other issues. Emission probabilities are very small numbers—there are a massive number of fluent sentences. Let’s say that P is a genuinely correct fluency model—in practice, whatever we build will be a mere approximation—while Q is exactly like P, except for an overestimate of “This harmful sentence stands in for awful behavior in general and should never be uttered by anyone.” by 10-30 because it occurs in our training set. Now, 10-30 is basically zero—it’s less than one-millionth the probability of getting thirteen spades in a hand of Bridge two rounds in a row—so we are unlikely to encounter this sentence in ordinary use. Still, these models are used in practice to estimate conditional probabilities, and we’re often conditioning on complex observed data that have similarly low probability, and it’s possible that a context will be discovered—especially if an attacker is trying to do so—that brings that discrepancy to a measurable. This is believed to be where training data memorization (an undesirable trait in a language model, which ought to perform equally well on sentences it has never seen) and unintentional plagiarism come from.
Beyond this, there is a strong argument that large language models, though they store knowledge, do not actually “know” anything, because their pattern-matching skills condition them to manufacture (hallucinate) knowledge when they really possess none. They do not seem to reflect on what they know, what they don’t know, how they know what they know—so much as they fluently imitate people who know what they are talking about.
Reinforcement Learning
Fluency scoring can be used for interpolation (“Farisa scratched ??? elbow.”) and next-word prediction (“Farisa scratched her ???.”) When it comes to sentence completion or grammar correction, we usually want to choose the most fluent sentence in the context of the emitted one. That is, we are inferring the intended sentence—which we don’t know, so we need a probabilistic model—from the written one on the assumption (which does not always hold) that any departures from standard grammar are accidental. In text generation, though, using the most probable next token results in very uninteresting emissions; doing so will bias the program toward choosing only common words. A variable called temperature is used to moderate this effect; at high temperature, words deemed less fluent are tried more often—at a temperature of zero, only the most fluent next word is ever considered. This is analogous to a known tradeoff in machine learning between exploration—considering more possibilities—and exploitation—local maximization of the given objective function.
These probabilities, of course, are all conditional. “Eight” is not more fluent than “Seven” in any absolute sense but, in a dialogue, “Eight” is a more fluent response on the condition of the query being, “What is five plus three?” Of course, there is no way to explicitly store this complex probability distribution—the number of conditions we might impose is nearly infinite—and we do not even know what “the real one” is—so we infer it, approximately, from text we consider grammatical, fluent, and sufficiently modern. After millions of processor-hours of training, we are able to build such a model that shows basic competence at a large number of language-processing tasks.
In the context of a user-facing program, though, it is not only fluency we desire. Perhaps “Fuck you” is a more fluent response than “I feel excellent today” to “How are you?” Still, we wouldn’t want chatbot to favor the former. A raw foundation model can understand language very well—it recognizes patterns, due to the immense computational effort invested in training—but that doesn’t mean it does things with that knowledge that we like. These agents must learn from human feedback what is appreciated and what is undesirable. First we train the model to be capable; then we train it to be useful.
Literature
Machines can mimic human intentionality, but they do not possess it. They can write 200-word clickbait articles that humans prefer over those written by real people, but there is no artistic process. These models are also, in their own way, lazy. Estimators of mathematical functions (or probability distributions) will prefer the simplest one that fits the data—and we want them to do this—rather than show creativity, which they do not have. An AI that is trained to play a video game, if it discovers an exploit that enables it to cheat its way to an infinite score, will do so. The difficulty involved in crafting serious artistic literature is something machines, if they even had the ability to participate in real human experience, would shy away from. This is not the only reason the artistic novel is under no threat at all, but it’s the easiest one to explain. An artistic novelist might do seven drafts of a story before considering it ready for publication (even then, there are still mistakes; when fixing the biggest or highest-level issue, we miss an occasional lower-level detail) and so much goes into the process that, while a machine could mimic some of the patterns, it would miss the spirit. The most attentive readers would be able to tell.
The commercial bestseller, though? It won’t take long. Consider the outlier success of Fifty Shades of Grey. There is always luck involved, but we mostly know why that book succeeded—Jodie Archer and Matthew Jockers cracked it in The Bestseller Code. The taboo level of the subject matter—slightly outside the Overton Window, but not too far from it—helped, but the main contributor was the whipsaw sentiment curve: the jarring ups-and-downs that made for a reading experience that many found addictive. It’s unlikely that the author intended this; the subject matter of an abusive relationship with an obscenely powerful man was a natural source of lability and conflict. This is not how human authors—commercial or artistic—prefer to write in general. It feels manipulative to intentionally jerk the reader around, just to sell more copies, and almost no author wants to be “hate read” even though hate-reading was a major factor in Fifty Shades’s success. Machines will have no such aversions to optimizing for sales potential. They won’t consider it demeaning to “write to market.” The bestseller is a reinforcement learning problem and they will solve it.
Traditional publishers, although they are hiding their tracks on this for obvious reasons, are investigating the capability of AI to generate these sorts of uninteresting books that, while some become bestsellers and others flop—it is certainly not the case that writing a mediocre book ensures a million copies will be sold—are easiest to market. This all said, it will be impossible to market books if they are known to be the products of algorithms. No one wants to read that. Therefore, we will see a fleet of artificial author personalities—they may be backed by real humans, who make in-person appearances but use language models for their social media presence. This will extend further when it comes to influencers, not only in publishing but in the consumer economy more generally, due to the soon-to-arrive ability to generate photorealistic video of human activity. In the 1930s, a calculator was a person; now, it is a computer. Today, an influencer is a person; tomorrow, it will be a process that runs on a GPU cluster.
It is easy to fool one person for one month; I have seen very talented researchers who work in artificial intelligence become convinced that language models are conscious. At first, this seems absurd, but there is nothing in nature but us that uses language to communicate; thus, some early users of language models became victims of pareidolia and confirmation bias. The good news is that it’s very hard to fool millions of people for decades. It may be possible, but there’s no real incentive to do it. It would have been harder to fake the Moon landing in a way that has withstood more than half a century of professional scrutiny than to achieve it. Readers—not literary agents, not publishing executives, but ordinary readers in the millions over time—will figure out which books show true originality, and it is very unlikely that a single one of the AI-generated bestsellers we are about to see will make it into the literary canon. Getting a fake novel onto the New York Times bestseller list will be merely difficult—thousands will try every year, and only a few dozen will succeed—but faking an artistic novel that could stand the test of time is so much harder and, even if it were possible, there will be no incentive to try. It is easier to make $1,000,000 than to enter the literary canon; it is also a sensible financial play to spend $50 on a 1-in-10,000 shot at the million. On the other hand, there is no value at all in one’s novel being remembered for two hundred years if one did not actually write it.
The Future
When it comes to politics, technology, or economics in 2025, anyone making confident statements more than three or four years into the future is at high risk of being wrong. It’s not difficult to see the trends as they are, but there are so many sources of uncertainty that could influence anything or everything. So I will not try to predict the future (because that is impossible) so much as I will assess where, based on present information, things are most likely going.
I do not intend to impose a general hierarchy on literature. I will not claim that some genres are better than others or that specific approaches to writing are morally superior. I only observe that there tend to be three approximate levels of investment in a literary work—hobbyist, commercial, and artistic. It is using these three categories that trend assessments shall be made.
Hobbyist writers are new to the craft and do not know if they intend ever to take a commercial or artistic direction. At this stage, the stakes are low. They don’t know whether to expect six readers or sixty thousand. They tend to write in genres that are off-format by the standard of traditional publishing, but that are also where a disproportionate share of new ideas will come from. Some of the great writers of the mid-21st century are currently teenagers on Wattpad. Hobbyist authors will put serious effort into revision, but only if they have a sense of traction; otherwise, it makes sense to put the time into a fresh new story that will probably be better, as their arc—if they have the talent and keep at it—is steep and upward.
Commercial authors strive to produce professional-quality work, but they have to do it quickly. Three years without a new release will have them forgotten. If they are self-publishers, algorithms will punish them if they step off the content treadmill; if they use traditional publishers, it will strain relationships to slow down to one book per year or less. The time to do five or six drafts, or to make structural edits to fix small plot issues, is not there—especially not in 2025, considering how much marketing work has been foisted unto authors, since most publishers (except on top-tier commercial properties) no longer perform it. No commercial author wants to write bad books, and very rarely does a book become a bestseller because it is bad, but the tenfold increase in writing effort that an artistic novel requires is unlikely to be paid off in sales—it’s a far better bet to write ten books, and be diversified.
Artistic fiction is written much more slowly—it is measured in years per book, not books per year; the total investment, in hours per page rather than pages per hour. Commercial books tend to be within 20% of genre-specific word count targets; artistic novels are all over the place, but tend to run long. These books are also, in general, much harder to market—it is not clear who the audience is until a large number of people have read it. Sometimes these books are adored by critics and win awards; sometimes, they’re misunderstood and trashed. An artistic novelist might only write six or seven novels in their life; the objective is not to write a significant span of works, but to produce the best possible ones. Artistic fiction is also exhausting; these writers hate marketing themselves because, if they have the time and energy to spare, they’d much rather spend it on an additional revision, fixing perceived deficits so minute that no one else would even notice them.
It is probably not a choice whether one becomes a commercial or artistic novelist. Some people—including some extremely capable authors, like Stephen King—hate rewriting the same book for the third time. Others do not consider a work finished until it has been through nine rounds of revision. This seems to be as fixed as neurotype.
The distinction between commercial and artistic fiction is sometimes framed in terms of “literary fiction” versus “genre fiction.” This labeling is misleading on multiple accounts. To start, artistic fantasy, artistic sci-fi, and artistic mystery all exist. At the same time, today’s definition of “literary fiction” includes constraints that make clear that it is, in fact, a genre. There’s a lot of excellent artistic work in that genre, but it is no less a genre for the fact. In truth, the matters of genre (since all work has genre) and artistic/commercial inclination are orthogonal.
I don’t intend to claim that one mode of production is superior over another. Artistic novelists tend to be far better writers than commercial or hobbyist ones, but that’s because they spend so much time focused on the writing itself. Commercial and hobbyist writers, I suspect, are better storytellers for a rather obvious reason: they tell more stories. The skills you build telling the same story in a better way for the ninth time are different from those you build telling nine stories, some of which are excellent and some of which are duds. Neither is superior to the other. Thus, the tendency we have to associate commercial storytelling with fast-fashion dreck is not indicative of the capabilities of all commercial writers—quite a few of them are indeed excellent—but a product of incentives and what systems amplify.
Even if commercial fiction is not conquered by AI, the processes that market and distribute it will be. AI influencers will out-compete human ones; this is a given, because AI influencers will have so much time and energy. Methods of acquiring social proof, credibility, and reach that are today merely figuratively mechanical will, tomorrow, be literally so. It doesn’t matter much, in this regard, whether traditional publishing rallies or continues to fade, leaving us to return to the historical norm of self-publishing. Either path leads the same way. Prices of commodities used to be set by human judgment; today, fast algorithmic traders fill the role. A market that runs at faster-than-human speed will never allow itself to slow down.
Of course, readers will still want to read books written by humans—that’s inevitable. The concept of an AI-written novel was once a badly-written novelty, but a novelty; now it’s just a disappointment. It is therefore likely that, while traditional publishers will use fleets of AI influencers on rented GPUs to market their books, they will retain human authors for their top-billing commercial endeavors. There will be a lot of competition for these slots, but some autonomy may be sustained there for the authors whose names appear on the work. This might be the best possible outcome for commercial literature—for it to evolve so that writers are left alone to write, with “all the other crap” automated.
Hobbyist fiction will endure for the reason that, even if there are few people to read it, people will still write it. The truth is that, whatever fate befalls traditional publishing, it’s not going to have a huge effect here. The best hobbyist fiction is usually too weird to ever get a corporate book deal in the first place. Still, there will be people attempting to trick readers in these spaces with fake books; therefore, verification of human authorship is likely to become more important. It’s possible that top-tier hobbyist authors—in some cases, as they become full-fledged commercial or artistic ones—will live-stream their creative processes, like elite video game players, either to show that they are not using AI at all, or to showcase technology’s capabilities and limitations.
Artistic fiction (in spite of any perceived hierarchy) has more in common with the hobbyist than the commercial kind and will likewise endure. Its value may even increase in a world of that may feature mass-produced AI fiction and that definitely will feature mass-produced AI book buzz. In other words, the flood of trend-matching and socially acceptable but uninspiring work coming out of commercial publishing (which will use AI in at least some of its processes) may redouble the human craving for real human writing. What is probably inevitable is that artistic fiction will be increasingly pushed into self-publishing, which is both excellent and terrible news.
The upshot of this change will be that, if we can increase the independence of artistic authors from corporate patronage, and if we can make self-publishing more affordable and scalable, we’ll all win. The bad news is that we’ve still found no evidence of a replicable process by which self-published work can be discovered. Most people can’t afford to spend tens of thousands of dollars marketing an unproven book, and self-publishers seem to spend just as much time on non-writing marketing tasks as their traditionally published counterparts, if not more. Over the past hundred years, traditional publishing’s selling point, to the public and to authors, has been that “good writing always gets found.” This was never true, not even in the midcentury golden age of traditional publishing, but it is still not true for today’s self-publishers. Without a four- or five-figure investment at a minimum, finding an audience—and building it quickly enough not to be forgotten—is almost impossible no matter how good the work is.
We have yet to solve this problem, but if a solution is found, it will probably come from self-publishers—not from traditional publishing, and probably not from the large technology companies that, unfortunately, have far too much power in the self-publishing ecosystem.
Decisions
In the long run, AI is more likely to save the artistic novel than destroy it. For serious literature, discoverability is basically dead. Most people in publishing—literary agents, editors—will tell you that only about 0.5–1% of finished novels by new authors that are queried will ever get published at all. Quite a number of those will not be well-published; they’ll be spine-out on a shelf near the back of a bookstore for eight weeks, then pulped. Note that I’m not saying there’s a 0.5% chance per agent or per publisher—that number would be much lower—this is the probability that it gets in at all.
It’s easy to believe that because “90 percent of everything is crap” one’s odds improve by a factor of ten if one has talent, skill, and drive. This isn’t quite true, because crap gets into the system for all sorts of reasons, so the real boost, if the writing is good, is about a factor of two or three. This has been verified many times by querying traditionally published and even award-winning novels; they do get an agent’s or editor’s initial support (obviously, they cannot actually be published) more often than the median slush-pile denizen, but there is no glass elevator for literary excellence. As someone who’s worked in marketing analytics, I’ll tell you that a lift of 2.0 is a solid accomplishment, but it improves the odds only to 1–2%. Persistence and strategy can improve those numbers further, but only by so much—another factor of two is an upper bound, because reputation is a finite resource—and at the cost of time that most authors would prefer to spend writing. Ninety percent of everything is crap, and ninety percent of everything that gets through the system is still crap, but ninety percent of what’s good never gets through, and even much of what gets published will not do very well at all. Most books that achieve traditional publishing are not marketed, are not publicized or reviewed in places the public sees, and are therefore destocked by bookstores (and similarly deboosted by tech-company algorithms) in the first eight weeks and fall into oblivion through no fault of their own. It has been empirically verified that over 90 percent of yesterday’s literary masterpieces do not get through today’s query wall; it is certain that well over 90 percent of today’s will also go forever unnoticed.
You can self-publish, but this has its own issues. As I’ve mentioned, proper self-publishing requires a five-figure investment. Most people can’t afford that. It also relies on untrustworthy algorithms within an enshittifying internet. There are no literary agents, it’s true, but the need to acquire high-status readers—“Booktubers” and “BookTokers” whose blessings are necessary to gain enough exposure to let organic word-of-mouth take hold—does not go away, so I am not convinced that the process is any less antimeritocratic than traditional publishing.
I don’t believe that AI will ever be an adequate substitute for a deep read performed by a skilled human critic. However, that’s not what the bulk of people who show up at the poor door of querying get—a skilled human, possibly; a deep read, surely not. They get 12 minutes between the end of the reader’s 11:00 meeting and the arrival of lunch. If the manuscript scans well, the submitter may be afforded a 28-minute read before a decision is made. The small number of people publishing considers important enough to give decision-making power simply don’t have time to read even a fraction of what is out there. Thus, book deals are offered, advances are determined, and publicity opportunities are allocated, on vibes alone, before a grand total of 97 minutes has been spent with eyes on the page. Thoroughness is for editing, not business decisions. This is not because these companies are poorly-run; it just does not make a lot of business sense to assess literary quality on its own. If Fifty Shades can be reworked into a bestseller, the quality of the writing does not matter much at all, so why waste money and time—and “time is money”—assessing it?
Authors, editors, and publishing executives all know that, in today’s world, failure or even mere adequacy in the short term means there is no long term. Chain bookstores, being skilled at abusing the consignment model, return books that do not sell effortlessly in the first eight weeks. This means that reader word-of-mouth—potentially exponential, but a slow exponential—is disenfranchised in favor of publisher preselection. Given the number of people who believe they personally deserve to be preselected, or who are at least willing to try, there are too many manuscripts and there is no time to decide on anything but vibes. A deep read comes much later—that’s an editor’s job, after contracts are signed.
Can AI help? Well, there’s a lot that it doesn’t do very well. It can fix grammar and spelling errors, and it can spot some microstructural weaknesses, but I wouldn’t say that it can edit. Compared to a dedicated and skilled human editor, it’s still quite weak. For line editing, it has a high rate of false positives; its suggestions, which tend to be geared toward business correspondence, will make fiction distinctly worse. I wouldn’t use it at all for structural editing; I haven’t evaluated it for this purpose, but I don’t have high expectations at all. There is absolutely no evidence of AI “writing” showing any artistic merit.
As someone who’s been programming software for quite a while, it’s not impressive to me that large language models generate text, because that’s something computer programs have been able to do for quite a while. Does anyone else remember the Markov text generators of the 2000s? The resulting text was awful, but it was legible and sometimes grammatical. We’ve seen significant improvements for formulaic writing (e.g., corporate emails) but nothing groundbreaking. What is genuinely surprising is the ability of these programs to read natural language and extract enough information to respond intelligently to user queries. There’s so much nuance in language that, until very recently, this was considered prohibitively difficult for computers to do. Today, they can. They do not perform a deep read; they perform an impressively competent speed read.
Could AI evaluate manuscripts well enough to discern manuscripts meriting a fair and deep read (which most submitters will never get) from the deserving perma-slush? Probably. The goal here is not to spot great literature—no one is expecting AI to do this—but to do the same sort of heuristic read on which literary agents and acquisition editors must rely, but faster and at scale. I’ve done the experiment several times where I select five passages, determine how I would rank them from best to work, and then ask various language models to evaluate them. It doesn’t always reach the same conclusions I do, and even if it did, that wouldn’t necessarily make it correct, but it is reliably close to the expected ranking—far closer than it would be by guessing randomly or even by relying on simple heuristics.
No algorithm—nor any human-run process—exists anywhere that can spot manuscripts deserving of serious publication opportunities with 100-percent accuracy. Still, technology doesn’t have to be perfect to be useful—it only needs to be an improvement over the system that is currently in place. An AI autograder could improve fiction publishing substantially. It would give quick and useful feedback to authors, even the ones who don’t make it in. Although it would not offer the same quality of read as one might expect from a skilled, dedicated human spending real time on the work, it would leave 95 percent of authors with a fairer and more in-depth read than what the current system will ever afford them. Last but not least, the existence of autograders would shift the balance of power in the author’s favor—she would no longer be reliant on the willingness of an agent or editor to vouch that she is any good, because the scores (imperfect indicators, but no less so than today’s social signals) would speak for themselves. Ultimately, if language-processing technologies increase the rate at which good writing gets found to even 5 percent—not 100 percent, just five—then we will have improved the world. Regarding this more modest target, I have absolute faith that it can be done.