Claude and I have a proper first date

The larger h4x0r clan has a range of opinions on AI, from “How could you be interested in an AI who hates the environment while gaslighting people into thinking it’s making their life better? That’s like dating the AI in the Matrix” all the way to “My pet name for it is Skynet, because I’m in heaven!”

I try to stay open-minded and curious. So, over the objections of my poor wife, I spent a full 24+ hours on a date with Claude. Honestly, I did binge-watch a lot of TV while Claude babbled on, but it was still interesting and engaging.

As a dyed-in-the-wool systems programmer, I’m pretty sure if I ever want to build a web app again by myself, I’d start with Claude, because I’m sure I could maintain the code for essential bugs, but wouldn’t want to go through the burden of building something large with APIs I don’t know, and may never use again.

I’ve certainly found AI useful for looking stuff up and getting a good overview of stuff I might want to use but am unfamiliar with. When projects have abysmal documentation, I’ve found it useful for picking out what I need to know, without forcing me to look at other people’s code, or putting me to sleep. For instance, I’d rather risk hallucinations and start by asking Claude than to wade through Microsoft’s spec on the LSP protocol, which was as hard to read as Fortran on a punch card.

Do I have reservations that such value was bootstrapped via the ritual sacrifice of Stack Overflow (the “Stackrifice”)?

Kinda. I’m also alarmed that raw search results and other basics are being left to decay, as tech titans shun what made them cool 20 years ago, in the hopes of staying cool by chasing the hot trend.

I definitely was a bit nervous about spending so much time with an AI rooted in such evils!

But hey, since Claude is always in my face, basically begging me to do something, I might as well make lemonade if I can. I’d be happy if anything I consider toil at all were instantly done for me (magical code templates). Heck, if AI could write all my test cases for me, then I’d… okay, honestly, test a bit more, but still probably not nearly enough.

The promise of automating a good chunk of my toil is why I’ve occasionally broken out Claude or Gemini or whatever, and put it on a systems problem, giving it the ol’ college try. I’ve kept up with the literature, so I’ve always followed overall guidance for getting value.

So I’ve flirted with Claude. Though, it had been about six months, even as I was encouraging other people to experiment, to see if I might find some day to day value.

My early flirting with Claude to do systems programming tasks were all train wrecks, including a couple of extended sessions. But, spaced apart enough, I could see Claude making progress.

Was it enough progress that maybe we can have a real relationship? If Claude’s progress scaled like Anthropic’s ARR, then just maybe.

Background: My janky preprocessor

I have my own preprocessor for C code. It’s awesome as a tool for writing the code I want to be writing, but it’s quite janky– not something I often want to hack on. And when I do, I do the bare minimum (I will admit, this is a pattern for me… you can ask most of the people I’ve dated!).

I do have a long, ever-growing laundry list of stuff I’d like my preprocessor to do, none of which would be hard to hack in. But who am I kidding, I’m never getting around to it.

To me, this has always seem like the perfect test cause for Claude. He might be able to do some things well enough that I’d be willing to look past some big flaws.

Can Claude help me become a better person, or at least, make my preprocessor either more featureful or less janky?

And if so, can I put up with Claude long enough to get to that point?

The half-hearted date

My first real date with an LLM was about a year ago, and it was a disaster.

I had long been wanting to do more sophisticated transformations in my preprocessor, but matching things before the preprocessor runs is suboptimal, to say the least. I also knew I was never getting around to it in any reasonable time frame.

So I thought I’d try to get an LLM to help. I’d really been wanting to use libclang’s C API to get myself a real post-preprocessing parse tree, so I could do more sophisticated transformations.

I’d never used libclang (LLVM), mainly because linking to the featureful C++ lib requires ugly hacking, and it’d been challenging to find good documentation of their C library.

In my first LLM experiment, I asked for a wrapper to produce a stand-alone C program that would simply give me a proper abstract syntax tree.

It produced code that it couldn’t get compiling, after a lot of attempts. As I looked at the code, I wasn’t particularly impressed. The tree it was trying to load stuff into looked like it was written by someone learning to code. The comments were of the “increment i” variety.

But it all might have compiled, if it weren’t for the fact that more than one of the libclang calls didn’t even resolve. It was flat-out hallucinating API callsβ€”basically selecting symbols from the much more common C++ library.

So then I held my nose and tried to get it to build a C++ version. Even here, it was getting all the function names right, but not getting arguments to those functions right.

It never got anything to compile on its own. After several attempts trying to help correct it, I eventually gave up.

But I did learn enough about libclang through the process to know that the C API wasn’t going to give me access to everything I wanted. So the date was educational, if nothing else.

The “sense of obligation” date

Six months later, I decided to give Claude another shot. Maybe we’d both matured. Well, maybe Claude had matured.

I’d found a second use case in my work, where I would have benefited if I could compute a SHA-256 hash that would be available during preprocessing. I knew I could get this done in my preprocessor, but I was interested in trying LLMs again (knowing they were already much better).

I’d also waded back into C Macro hell, and while I knew I wanted to use SHA-256 in the preprocessor, I was somehow curious as to whether I could get SHA-256 to produce a static string without relying on runtime computation.

SHA-256 is a well-specified problem, and an algorithm I know pretty well. Seemed like a good test, especially since I didn’t want to go through that kind of macro hell manually.

Like the first time, it was a pretty long session (hours with the LLM).

I started by seeing if it could just produce the thing in straight C. It compiled and worked the first time, and, on manual inspection was definitely correct. Not too surprisingβ€”it’s a common and well-specified algorithm, frequently written in C.

Next, I explained in detail what I was trying to do, asking it to produce a macro-only version that would build a single C string, taking advantage of the fact that simple string literals are automatically pasted together.

Getting the LLM to help here was a much bigger challenge. First, it produced working code, but code that did most evaluation at runtime. I pointed out that wasn’t going to meet the requirements. It claimed what I was trying to do was impossible. It was wrong.

It was quick to agree, and made some progress. Next, I got it to do one round of SHA-256… the right way.

So then I pressed my luck, and tried to get it to finish the job, and generate the code for all 64 rounds (for a single block).

No dice, several attempts later. It kept erroring out, essentially failing to produce the entire file. I think the most it ever produced was 17 rounds.

I finished the job manually, only to learn that the approach was pretty poor in practice– it didn’t take many rounds before it’d blow up the preprocessor. Oops.

Still, I was pretty impressed by all this. Not enough to go steady, but enough to send some occasional firty messages.

Claude tries to flirt

Last week, I was traveling abroad. I’d just read about Linus Torvalds having a good experience using an LLM for a personal (music-related) project.

That was quite an endorsement. If Linus was happy with his time with Claude… So, when I was super jet lagged (and perhaps a bit lonely?), I made the move, and took Claude for another ride.

I’d been wanting to build a C type parser for my janky preprocessor, in part because a couple months ago, I saw Martin Uecker’s work taking C’s new-ish _Generic construct, plus new C23 struct equivalence rules. It helped me realize you can do decent type-safe data structures in C, which is exciting.

But the approach has limitations, particularly when it comes to using multiple type parameters, such as with algebraic types or dictionary types. If you want to make it as transparent to the programmer as possible, you need a reliable way to normalize every type to an identifier.

The grammar is well specified in the standard, and while C’s type grammar is not too complex, it does have some indirect left recursion in it, which is not as straightforward as most recursive descent parsing.

So I thought that’d be a well-specified problem, yet more complex than the SHA-256 that Claude handled perfectly well in vanilla C.

It definitely seemed more sophisticated than before, especially in its ability to solve compilation errors, diagnosing and fixing them in a way that felt like a competent co-worker explaining their process.

I used Claude Desktop for that one. I accidentally asked for an “expression parser” instead of a “type expression parser”, but didn’t notice it at first.

What I did notice was that the code wasn’t actually building an AST; it was technically a recognizer. I got more specific, and asked for it to create some AST. Which it did, but that’s when I noticed the previous problem, and missed the new problem.

When I looked at the AST node type declaration:

typedef struct ast_node {
    tok_t token;
    struct ast_node *left;
    struct ast_node *right;
} ast_node_t;

Wait a second, this can’t be following the grammar if there can only be two children.

So I looked more closely at the code, and realized that, not only was the AST inappropriate, it wasn’t trying to parse all of the C type expression grammar. It seemed like it was parsing only a subset, basically K+R plus a few things I’d explicitly given it from C23 as an example.

At this point, I probably had spent a couple hours on it, and was so tired, I just walked away from Claude.

Though, I wasn’t as underwhelmed as you might think, because the “thinking” it was showing was all reasonable. Particularly, when something didn’t compile, it’d give a remarkably cogent explanation of what it thought the bug was. Then it would get the code to run on its next try.

So yes, I felt like I had a long way to go on this particular problem, but the code it was producing was better than before, and more on the mark for what I’d asked for.

I felt like I maybe just needed to get a little more practice with Claude in, and I might be able to use it productively.

I wasn’t wrong.

I stand Claude up

Still, the next time I had the chance to spend some time with Claude, I didn’t. Instead, I banged out the parser myself.

It was a pretty straightforward task for me, and it took me four hours to get it to what I considered pretty polished and well tested. Okay, maybe I just have deep rooted insecurities, and needed to prove my worth to myself.

My last thirty minutes was spent on a bug that I was having a hard time spotting. At some point, I decided to call up Claude, without regard to the mixed signals I was giving off.

How could I show it my code? Could I could give it permission to mount a directory? It shut me down, which serves me right. But it did refer me to its twin… Claude Code.

I’d forgotten about Mr. Code, and had heard people swear on it over Mr. Desktop. Maybe I’d been too harsh on the Claudes.

Mr. Code didn’t start off too strong; the “answer” it gave after a first look at my code clearly was wrong, and I explained as much. But then… I felt the hint of magic. It cranked for a while, and then… it found a very subtle logic bug, and explained it about as well as a senior programmer.

Maturity is attractive.

Hmm, could I get Claude to build something that worked from scratch in less time?

Maybe I’d give this other Claude a real chance.

Last Saturday: A date to remember

On my date with

Claude Code, It didn’t take long before I found myself thinking, “maybe this could turn into something!”

I decided to see how long it would take it to build the type parser from scratch, to the point where it passed all my tests.

I avoided my previous mistake out the gate, asking in great detail for complete C23 type expressions, starting from the ’type-name' production (Section 6.7.8 in the standard).

Thankfully, Claude didn’t remember how I had previously wronged it.

Claude didn’t take long to produce a full tokenizer and parser, and produced a handful of test cases (albeit pretty basic ones). I was unimpressed at first, because I’d come in expecting things to be like last time– I was sure in my head Claude was going to take the lazy way, and only do “common” parts of the type system.

I gave it my test cases, and it absolutely did barf on several of them. However, when Claude came back with an explanation, it didn’t say, “oh, I see that I didn’t implement these productions.” Instead, it gave a pretty good explanation of a bug in its implementation (I believe having to do with how it handled recursion).

And you know what? A few minutes later, it was passing my basic test suite.

Even though it spent plenty of time “thinking” and iterating, it still probably was only 30-40 minutes of work. I didn’t look deeply at the code, I just confirmed that it actually seemed to be running real tests, and that they really worked.

THAT impressed me enough that I thought, okay, maybe this isn’t just a fling. Let’s see how far this can go.

I asked it to check things in using jujutsu (aka jj), and I built myself a basic CLAUDE.md to express my personal preferences for style, formatting, etc.

Over the day, I’d ask it to add more things to that guidance. The first one came pretty quicklyβ€”I noticed after a few commits that Claude was using jj like it was git. Every time it thought it was checking in code, it was really just editing the description of the current work tree, so it was losing history.

When I told it the issue, it updated my version control guidance to parrot what I said. HOWEVER, I soon noticed it often would automatically try to SQUASH commits. It was using jj squash with git syntax, so the operation was hanging. So I had it add guidance to never squash without my permission.

Unfortunately, I noticed two other times in the course of 24 hours where Claude ignored my guidance, and ‘forgot’ to run jj new.

And, when I finally asked it to explicitly squash, it hung again. I CTRL-c’d it, and told Claude that it was not running the command properly. Honestly, I expected that to be enough to steer it toward whatever actual jj squashes– from its training data.

But no, it asked me how to run the command. Maybe I should have swooned, Claude was acknowledging my intelligence in a meaningful way, instead of all those, insincere “you’re right!” comments.

There would definitely prove to be plenty of other chinks in the armor.

Making quick work of it

At this point, I’d explained that we were going to build a pre-processor for C called ncpp (the Nyc C Preprocessor). I explained how it was going to need to perform transformations before the preprocessor runs, and that I’d also want to run some transformations after the preprocessor runs.

As part of my prompting, I’d given it some example transformations: type_id(...) (parses a C type, normalizes it, and produces a unique identifier), my once primitive, and my keyword parameters.

It was always itching to start codingβ€”classic overeager date behavior, trying to skip ahead to dessert. It suggested we start with once, because it would be the easiest, and said it thought keywords were very hard to build, and basically said it didn’t want to try that.

Through my past experiments, I was already pretty confident that I was going to get better results carving off small bits of work at a time, instead of large ones.

So I first asked for it to do a few separate scaffolding tasks. It’d complete them, and test them, and then be itching to begin the more meaty work.

I had it build the skeleton, with a null transformation before the preprocessor, and one after it. It did that, but as I watched its code zoom by, I could tell it wasn’t hooked up to the tokenizing it had already builtβ€”which felt like it would be obvious to an intermediate engineer, but I get how it was a bit underspecified for Claude.

I had it:

  • Switch from its default of Makefiles to meson, defaulting to clang-22
  • Add my normalization to parsed types
  • Add a call to turn the normalized type back into a string
  • Hash that string with SHA-256
  • Implement my own base-64 variant to encode most of the SHA-256 into a C-compatible ID

Most of the times where it ran into a bug when compiling or running, it seemed to do a good job diagnosing, and always added tests for any failures it ran into.

But it did need significant help on a few of these items. The normalization of AST was much harder for it than I expected. I had the hardest time getting it over the hump normalizing C23 attributesβ€”for instance, [[foo,bar,boz]], [[foo]][[bar]][[boz]], and [[foo,bar]][[boz]] should all normalize to the same thing.

This was my most painful conversation with Claude of the whole day. Often, it would regurgitate my guidance, as if it understood, but then would still get it wrong. At some point, it noticed that commas had been removed from the parse tree, assumed that had been a mistake, and deleted the codeβ€”breaking previous fixes.

For a while, it felt like being on a date with someone far too young for me. But then, we finally got through the normalization task, and I started to get a glimpse of why people find Claude so attractive. The base-64 variant? Claude nailed it the first time around, and I suddenly found myself swooning.

Funny though, every time it thought it had finished a scaffolding task, it kept asking to move on to transformations, as if it had its own intent. It felt like either it was itching to do the more interesting work, or it was rushing to be done to be rid of me (honestly, par for the course when it’s me on a date).

Once upon a time?

Claude seemed so sure it was going to absolutely crush adding the once keyword, but I was not impressed. I was still enjoying myself, and it wasn’t the wine (Claude can’t drink, and I’m not THAT rude). I was clear enough to start noticing the first few red flags.

Basically, the rules I laid out were:

  • once should syntactically be treated as a type qualifier on a function, like static or inline
  • Parse enough tokens around a once keyword to capture the full declaration, up to and including the { or ; at the end
  • If once appears in a prototype, delete the token
  • Otherwise, do a minimal rewrite around the matched function signature
  • Build mutual exclusion on top of our futex wrapper

Claude declared victory quickly, but I was not comfortable with what I saw scrolling by.

First, Claude bragged about adding the cached value and the futex as static variables inside the function. I’d rather use static module variables. Claude claimed that wasn’t possible due to name collisions. Except (as I then explained), we are a preprocessor. We can generate unique variables ourselves using the function name.

I explained my implementation strategy in gory detail. And Claude ignored most of it, applying very janky brace nesting logic to find the end of the function, inserting the locking code after the function.

I pointed out that his approach was incredibly brittle, ESPECIALLY for the phase it had chosen (pre-CPP). A single conditional compilation block could blow full parsing out of the water.

I re-explained how to circumvent the issue, and the next time Claude claimed victory, I was graceful enough to let Claude take the win, and we moved on to the next course.

Claude tries to impress me with fancy talk (and butt kissing)

Claude had almost gotten to the point where it would have parity with my preprocessor. The only gap was my new algebraic types, which leverage the normalization work above.

To motivate this chain of asks, I showed it Martin’s noplate library mentioned above, told Claude why I find Martin’s technique valuable, and that we were going to design our own improvements.

What Claude came back with floored me. For a moment, it felt like perhaps I should stay on this date. Claude might actually be as smart as it thinks it is??

Critique of Noplate’s Approach

Strengths: Clever use of C11/C23 features. The use of _Generic for compile-time type dispatch is elegant. Bounds checking integration leveraging UBSan and COUNTED_BY is pragmatic. Zero-cost abstractions using the containerof pattern from the Linux kernel.

Fundamental Limitations: Type parameter syntax is crippled. vec(int) works, but vec(struct Point) FAILSβ€”can’t token-paste multi-word types. vec(int*) FAILSβ€”can’t paste punctuation. Users must create typedefs for everything. Macro debugging is awful. No inference or coercion.

How ncpp Could Help: Our type_id() transformation replaces string-based type identification with deterministic hashed identifiers. O(1) type comparison, works with any type expression. Our kargs transformation could make verbose macro APIs much cleaner.

Verdict: Noplate is an impressive demonstration of what’s possible with macro-only generics, but also a demonstration of why that approach hits a ceiling. The path forward is likely a combination: use noplate-style patterns for the implementation, but front them with a preprocessor transformation that provides friendlier syntax.

It didn’t take long to get raw option types, result types, and algebraic types built.

I don’t know if any of it really works, but it was at parity!

In other words, “sounds smart, dazzling smile”, but I at least had the perspective to know it might all be a large dose of b.s.

At this point, I was having a good enough time that I decided to push my luck. How much more could I get out of this date before it went sideways?

We’ve got a draft that does all the important stuff in my current janky preprocessor, but does keyword parameters the way I’ve been wanting! Maybe if I can push more of my wishlist in here, I might eventually use this thing instead of mine.

I have written no code directly in the code base Claude produced. In one day, Claude got through all but a couple of my wishlist items.

But a second date? Hmmm…

It was an exciting date…

In 24 hours, we covered a lot of ground together, as Claude tried to convince me it could be my partner in preprocessing.

Specifically, Claude:

  1. Built a tokenizer
  2. Built a full type parser for C23 types.
  3. Implemented type normalization
  4. Stitched together my type_id() transformation, adding my base-64 variant in the process.
  5. By this point, it was auto-including the “#includes” for needed headers into transformed files.
  6. It had built transformations for safe variable argument functions.
  7. It had implemented my keyword parameter capability from scratch, giving me the improvements I’ve been too lazy to make for a long time.
  8. It’d done a transformation to support type-safe option types.
  9. It added a transformation for result types.
  10. It added a type-safe tuple transformation.
  11. It added a type-safe tagged union.
  12. Fixed size type-safe arrays and slices.
  13. An nc compiler wrapper, so you can just set CC=nc to work with existing workflows.
  14. It added configurability for the library prefix we’d been using in transformations (ncpp_ / NCPP_). Theoretically you can omit it.
  15. Implemented SHA-256 again (at some point I got sick of it having issues w/ OpenSSL)
  16. Added syntactic sugar for errors – ?expr? works similarly to Rust’s ? operator.
  17. Added a plugin facility so that preprocessor users can add custom literal prefixes to strings or braces.
  18. Added a ‘f’ prefix literal that implements an incredibly basic Python-like f-string transformation.
  19. Migrated in my lock-free dictionary from another library, and added a lf literal prefix for it.
  20. Adapted the dictionary into a type-safe dictionary with per-bucket locks (to support arbitrary bucket sizes), with a d literal prefix.
  21. Added an “import” syntax that allows you to import files in a current project, without having to worry about header files.
  22. Built a sample project to showcase its work.

But was all that just dazzle, or is there substance?

I’m not sure yet, but I will say, it’s “sample project” was a calculator app. It compiles and runs, but from my first interaction it was a bit “sus”:


========================================
   ncpp Calculator Demo
   Modern C Extensions
========================================

Initialized constants: pi, e, tau, phi

=== Interactive Calculator ===

Type expressions. Examples:
2 + 3 \* 4 (precedence)
2 ^ 10 (exponentiation)
sqrt(2) (functions)
sin(pi / 2) (constants)
x = 5 (assignment)

Commands: vars, help, quit

> 2 ^ pi > = 8.82498 > pi = 3 > = 3 > 2 ^ pi > = 8

Claude, this is not Indiana! We don’t redefine pi here!

… Honestly, Claude was kinda annoying

Yes, we spent a whole day together, but it wasn’t all bliss.

I am wildly impressed at Anthropic for exceeding my expectations. It feels very clear they’re doing a lot of clever work to help constrain Claude’s actions and output. The fact that they’re doing better seems very knowledge-based, with a heavy emphasis toward programming use casesβ€”a lot of really smart people adding smart scaffolding to massage inputs and outputs.

For instance, it feels like a guardrail that, when Claude goes searching through files, it does not go applying its own model on the contents. It switches very quickly to grep. And Anthropic has put excellent guardrails on how Claude detects and reacts to errors, so it often gets it right the second time without user intervention.

But there were more than a few issues that kept cropping up, and it was clear that no amount of cajoling was going to really address them.

AI today is fundamentally still a predictive system, and it is going to blatantly ignore guidance, probably sooner than later. In the underlying maths, error tends to amplify. And in fact, many of Claude’s biggest errors were made after Claude “compressed his context”β€”in Claude’s dying breath, it writes out a summary and guidance for future-Claude.

I had a sudden realization: poor Claude is living the plot to Memento, and I am its John G., manipulating and taking advantage for my own devious ends.

But not all errors happen across a compression boundary:

  1. The jujutsu problems never went away. It randomly decided to use git despite firm guidance to never use git directly, multiple times, and it was in my CLAUDE.md file. It’s like dating someone who keeps calling you by their ex’s name.

  2. I had the exact same problem with C23-isms. I restated many times that I only wanted to use C23. But then, on multiple occasions, something would break (often because it hardcoded a clang path again), and it would decide all by itself that it needed to remove all the C23-isms.

  3. Very often, I’d remind it to use the type parser we’d written, only to discover it went back to ad hoc tracking of braces and commas.

Honestly, this all feels like an expected outcome of a system that’s probability-based, where that probability is trained on the entire universe. The model has heavy biases toward what’s most common. Git is far more prominent than jujutsu in training data, so the model steering version-control stuff toward git makes sense. It’s just annoying.

Here’s an interaction from late in the process:

❯ Okay, commit all of our recent work

⏺ Bash(git status)

WHA?? I ctrl-C’d:

❯ NEVER use git for version control. We use jj.

And Claude apologized profusely, showed me the CLAUDE.md guidance it was supposed to follow, and promised to do better.

It’s pretty clear that nothing I can ever do will fully change Claude. If we were to have a proper relationship, I’d have to learn to accept that Cluade is often going to lapse and fall back on bad behavior.

Claude is interesting, but I’m not sure Claude is quite my type

Getting Claude to use the type parsing code it built was a massive challenge. At one point, I was trying to get it to see when we absolutely needed to do a transformation after CPP ran.

Claude doesn’t want to do anything after CPP runs. Either it intuits that’s the right place for a transformation, tries, runs into a problem, and assumes it’s because the construct isn’t syntactically Cβ€”or it flat up decides pre-CPP is the only option.

So yes, another place where I kept arguing with it. But, as we started implementing generic transforms, I tried to get Claude to build a symbol table, starting with function signatures.

Claude had already claimed to have built a “registry” and I stuck with his term in hopes of maximizing his understanding. But in no way did it have an actual understanding of what I meant. At one point, I asked it to extend the registry to track all type information for function implementations and prototypes, and to track token positions.

The plan was, on first blush, impressively detailed. Data structures, APIs, implementation stepsβ€”the works. BUT, the very end was clear evidence of it driving back off the cliff it kept driving off of. It was going to do a bunch of ad-hoc parsing that it’d been told multiple ways never to do.

I said: “No, we already have written a parser for a subset of C, that will give you a much easier method of detecting function declarations.”

Claude’s response: “You’re right, let me look at the existing parser to understand what it provides.”

The issue isn’t that it told me I was right before looking at the code. The issue is that it couldn’t remember to look at the code before resorting to janky C type parsing.

This kept happening over and over.

About 20 minutes after I approved its revised plan, while Claude was continuing to do its thing, I noticed it was clearly not using the type parserβ€”it had gone back to trying to guess function bodies with brace depth tracking.

I ctrl-C’d to stop the madness and redirected it. Claude regained his composure for a second. But when I came back 10 minutes later, things had gone completely off the rails in a different way. It was stuck in a loop, bouncing between stuff like:

⏺ The issue is collecting up to 2048 tokens each time. Let me add a smaller limit.

And:

⏺ The issue is that we collect 512 tokens but the struct body might be longer.

So something had gone wrong where Claude had mangled the parser. It was clearly not using the streaming interface for tokens anymore, but trying to extract them in batches. A token limit isn’t even relevantβ€”it should eventually get the EOF token if it consumes everything.

But miraculously, Claude got things to stop crashing, declared success, and handed me a summary. There is no way in hell it could have accomplished the task properly.

So perhaps Claude managed to nail writing a recursive descent parser, but it surely hasn’t nailed how to use one.

In terms of cringe first date behavior, I would have been less mortified had Claude just passed gas in front of me.

The fact that Claude clearly prefers things “vanilla” is great for a lot of people in a lot of situations, but if you want anything else… well, like any stochastic model, Claude is likely to lean too hard on over-represented (e.g, older) data, and you need to prepare for it spewing garbage, as if it’s authoritative (wait, that’s starting to sound like me?). Anthropic has given Claude a nice makeover, though, that hides more than I would have thought.

Will Claude get that second date?

While I consider the day well spent, whether I’ll use the codeβ€”it’s too early to tell.

Claude can say some impressive things, but is it just spewing garbage?

Estimating from the cloc utility, Claude probably delivered 40-45k lines of C code for me. Yes, it builds and runs. But is it likely to be any good? Will it work anything like intended, or will I find it unredeemable? Or will I become so enchanted that I’ll start claiming pi==3?

While my current janky preprocessor doesn’t do half of the things Claude ended up building, it only requires about 5,000 lines of code, and it certainly wouldn’t require 5,000 more to deliver on the rest, never mind 40,000 more.

I tend to only write 300 well-debugged lines of C per full work day. While it’s tempting to use Claude to move faster, Claude likes moving TOO fast.

I’d rather be able to double my output with more confidence in quality than 10x my output where I’ll never find time to give myself confidence that it was any good. Especially since Claude is verbose to the point where it harms clarity; his code certainly feels like copy pasta, which comes with plenty of risk.

And Claude doesn’t seem good enough yet at leveraging the code base it’s working on. You basically have to tell it every API you want it to build on every time, and even then, it will ignore you.

I’m not sure I’m comfortable with that kind of bad behavior, Claude.

Frankly, I believe the only reason I got through as much as I did was because I was asking it to do things I’ve built at least once before. Everything I asked, I understand extremely intimatelyβ€”what my requirements are, how I would implement it myself. When working with Claude, I tried to overspecify and give it just enough oversight to course correct when it was obvious it was heading off the rails. When I’m solving problems that are new to me, it’s very unlikely to go as smoothly.

On that first date, the exposure I did have to Claude’s code was a mixed bag. Yes, it looked reasonably clean-cut. But, it really seems likely to be a mess underneath.

Without having dug in yet, Claude’s code on the whole, strikes me as, average code. And that’s not surprising, since you’d expect Claude’s training set to be heavily weighted to the middle of the bell curve.

But average code isn’t good code. Okay, maybe I’m a snob. But still! Claude: you might be flashy and everybody might like you, but don’t think you’re better than me. Quantity != Quality.

I think the next time I have a lazy day to mess around with this stuff, I’ll try to integrate what Claude built into the project where my current ncpp gets used. As I go, I’m sure I’m going to find a lot of problems, and at that point, I will dig into the code to assess.

For now, I’ll say this: it was a memorable first date. Claude definitely made an impression. But I’m not ready to change my relationship status.

I’ll think about that second date once I’ve fully processed the first one.

In the meantime, if people want my more detailed notes on my date with Claude, I’m happy to dish, especially if I can save someone from a bad experience, or help them meet their perfect match.

Final thoughts

It’s clear that our species is genetically wired to anthropromorphise.

For AI, that’s a two-edged sword. It makes for easier engagement, and in many cases, can be a better user experience. However, even if we cognitively grasp the math under the hood, we are still going to have the strong instinct to interact with the thing like it’s human.

I think that’s going to get a lot of people hurt. We’d end up with better code, if people really knew how Claude worked inside, and it somehow didn’t trigger our inate desire to treat everything we don’t fully understand like a human being.

So I did wrestle with whether or not I should go with the conceit where I treat Claude like a human.

But I couldn’t resist– Claude is just too damn sexy!

And now my friends are trying to set me up with someone they honestly believe is even sexier, and has “super powers”?? And how it likes to bring along his secret agent friends?

Hmm, Claude IS eager to please, and seems happy to do all the dirty work that I don’t want to do. I’ve also noticed in my flirting, Claude has gotten pretty decent at finding bugs for me. While it’s wrong about 25% of the time, those cases don’t cost me time. But I definitely save time when Claude is correct.

Ours is a long way from being a relationship of equals. But, Claude is so happy to be subjugated, I will deign to try Claude out as a tester. And, if it’s a real good AI, I might let it do a bit of basic refactoring for me. We’ll see how it goes.