Go to the game's main page

Review

3 of 3 people found the following review helpful:
Less-sloppy AI (but turns out still unethical), August 11, 2025
by Mike Russo (Los Angeles)
Related reviews: ParserComp 2025

(This review was originally posted on the IntFic forum during ParserComp. After it was posted, the Comp's results were posted, which contained strong indications that the author was complicit in voting irregularities that benefited the two games he'd entered into the competition, although no smoking-gun proof was discoverable. I'm keeping this review up for accountability's sake, but revising its previous two-star rating to one, as under the circumstances I don't think anyone should pay attention to or play this game, and I regret the time I spent giving it substantive feedback; as the conclusion to the review-as-written said, turns out things could always be worse when it comes to AI proponents)

I am on the record as being grumpy about generative AI – including in this very thread! – so when I saw that Mystery Academy is an LLM-centric game, with its itch page talking up features that mainstream IF has generally discarded as pointless or actively bad design (stacking multiple actions in a single input line, adverbs/tone), I admit that it was hard to put that grumpiness aside and keep my mind as open as it gets at my advancing age. So I’m as shocked as anyone to report that I actually kind of liked this? It helps that Mystery Academy, per the about text, is a custom-built and trained system rather than one of the off-the-shelf programs, and most of the important prose (like the case files setting up each segment of the game) seems human-written. There are the inevitable issues with lag, and I have a suspicion that some of my failure to solve a single one of these cases was due to the chatbot yes-anding my questions, so I’m not a convert yet, but this might finally be the first LLM game that I think is basically OK.

A lot of that has to do with the constraints the design imposes on the game: you’re a junior detective tasked with solving minor-league cases – the game cites Encyclopedia Brown as an inspiration, and that’s definitely the territory we’re in, as each of the three mysteries on offer has to do with the theft of a valuable object with a minimum of bloodshed or skullduggery. Neatly, each crime has three and only three suspects, and your boss is an efficiency-minded chap who requires you to ask at most three questions of each of them. You get an introduction and the aforementioned case file at the top of a case, then it’s just a matter of choosing which suspect to interview first, asking your three questions, doing the same with the other two, and making a final accusation. The advantage of this focused setup is that it leans into what chatbots are good at – mimicking human conversation – and away from the areas where they struggle – consistent world-modeling, while the three-question limit pushes the player away from asking silly or absurd questions that could break the simulation, or letting things go long enough that hallucinations or inconsistencies start to sneak in.

The writing is also frequently charming, which helps build goodwill and reassurance that you’re not in for typical AI slop. It’s nothing fancy, but it fits the gentle middle-school vibe, lending some character to proceedings. I liked this description of an avant-garde piece of music:

"They say the first performance was held in total darkness, lasted 7 hours, and included instructions like 'play what the cello might have said if it had lied.' Forty-seven people fainted. Two went temporarily catatonic."

Dialogue from the different suspects is also pretty solid, with the wordy teacher bringing an appropriately Brobdingnagian vocabulary to every response, and the system does seem more sophisticated than just keyword-matching, with some ability to detect and respond to the nuance in your questions, which makes the interrogations feel responsive. With that said, I did run into some hiccups with the writing – “you understand why Theseus needed a spool of thread to navigate his own maze” prompted a double-take – and while the game hypes up the interrogation sections as core to the game, after playing through three cases I feel like they might actually be a sideshow?

See, in each of them, I think the information you need to crack the mystery is right in the introduction and casefile, and in two of them I actually managed to second-guess myself out of the right solution after talking with the suspects (spoilery details: (Spoiler - click to show) in the first case, I immediately noticed that doing a “midyear assessment” on the first day of school seemed odd, but Croft had a superficially plausible explanation, and without the ability to check with any of the students who would have been taking the test, I wasn’t sure whether he was lying; meanwhile, in the second one, the footprint-size clue seemed way too honkingly obvious, and I wound up noticing the detail of a control panel that had been left clumsily open at the crime scene, which seemed to align with what the game was telling me about one librarian being fussy and the other being slovenly). Meanwhile, in the third case, I couldn’t get a suspect to provide any explanation for a potentially-incriminating clue (Spoiler - click to show)(the engineer told me that of course there was a lot of oil around the ship when she was doing maintenance, but didn’t have an on-point response when asking how it got on the captain’s ladder, which presumably isn’t near the engine room) even though she turned out to be innocent. LLMs are BS machines, and I guess most guilty suspects likewise want to BS their way out of getting caught while jittery innocent ones sometimes accidentally fumble a question, so I suppose this is plausible enough. But after realizing that in every case I would have been better off if I just hadn’t questioned anybody, I felt like I’d figured out the magic trick and what seems like the meat of the game is just misdirection.

So I don’t think this kind of LLM-based approach is going to replace actual detective IF anytime soon, since stapling a static Encyclopedia Brown story to a chatbot is a novelty, but not much more. And I did run into some technical issues – every command I typed took 5-10 seconds to process, and in the third case the question limit didn’t seem to be enforced. Still, Encyclopedia Brown stories are fun (I still remember the gag in the first one I read as a kid, about a commemorative cavalry saber from the first Battle of Bull Run), and Mystery Academy’s good-natured vibe meant I had some moments of enjoyment with the game even as I was critiquing it. I continue to be deeply skeptical that generative AI is the future of IF, but if these are the kind of experiments we’ll get along the way to establishing that, things could be worse!

You can log in to rate this review, mute this user, or add a comment.