First, they used ChatGPT to cut costs. Then they hired us to clean up the AI technical debt

Aug 5, 2025 10 min read

Summarize article with AI

You’d be surprised just how many companies are doing this now.

As reports from across the industry indicate, there’s now a growing specialized sector for engineers who focus on correcting AI-generated code errors.

The pattern has become remarkably consistent. Companies turn to ChatGPT to generate migration scripts, integrations, or entire features, hoping to save time and cut costs. After all, the technology appears fast and accessible.

Then the systems fail.

And they call us.

Recently, we’ve been getting more and more of these requests. Not to ship a new product, but to untangle whatever mess was left behind after someone trusted a language model with their production code.

At this point, it’s starting to look like its own niche industry. Fixing AI-generated bugs is now a billable service. And in some cases, a very expensive one.

GitClear’s 2024 report confirms what we’ve seen with clients: AI coding tools are speeding up delivery, but also fueling duplication, reducing reuse, and inflating long-term maintenance costs.

In one case, a client came to us after an AI-generated migration dropped critical customer data. We spent 30 hours recovering what was lost, rewriting the logic from scratch, and cleaning up the pipeline. The ironic part is that it would’ve been cheaper to have a senior developer write it the old-fashioned way.

Let’s be clear, however, we’re not “against AI.” We use it too. And it’s helpful in the right context, with the right guardrails. But what frustrates me about the overreliance on AI and its widespread implications — and probably you too — is the magical thinking. The idea that a language model can replace real engineering work.

It can’t. And as the saying goes, the proof is in the pudding. When companies pretend otherwise, they end up paying someone like us to clean it up.

So, what does one of these clean-up jobs look like? Here’s what the AI-afficionados aren’t telling you when it comes to time lost and money wasted.

What a typical request looks like

The message usually comes in like this:

“Hey, can you take a look at a microservice we built? We used ChatGPT to generate the first version. We pushed it to staging, and now our RabbitMQ queue is completely flooded.”

It always starts small. A task that seemed too boring or time-consuming. Something like parsing a CSV, rewriting a cron job, or wiring up a simple webhook. So they hand it off to a language model and hope for the best.

But here’s the thing: the symptoms show up much later. Sometimes days later. And when they do, it’s rarely obvious that the root cause was AI-generated code. It just looks like… something’s off.

“You can’t outsource architectural thinking to a language model. AI can speed things up, but it still takes engineers to build systems that don’t fall apart under pressure.”

Dmitry Nazarevich

Chief Technical Officer

After a dozen of these cases, patterns start to emerge:

No tests. At all. Not even a hello-world assert. Just raw, speculative code that was never exercised properly.
No awareness of system boundaries. We’ve seen ChatGPT scripts that query three microservices synchronously, ignore timeouts, and blow up the entire call chain on the first failure.
Misuse of transactions. One client used AI-generated SQL with nested transactions inside a Node.js service using Knex. It worked, until it didn’t, and half the writes silently failed.
Subtle race conditions. Especially in multithreaded or async-heavy codebases. The kind of bugs that don’t show up in dev but wreck production at scale.

And of course, when everything collapses, the AI doesn’t leave you a comment saying, “By the way, I’m guessing here.”

That part’s on you.

Case 1: The migration script that quietly dropped customer data

This one came from a fast-growing fintech company.

They were rolling out a new version of their customer data model, splitting one large JSONB field in Postgres into multiple normalized tables. Pretty standard stuff. But with tight deadlines and not enough hands, one of the developers decided to “speed things up” by asking ChatGPT to generate a migration script.

It looked good on the surface. The script parsed the JSON, extracted contact info, and inserted it into a new user_contacts table.

So they ran it.

No dry run. No backup. Straight into staging, which, as it turns out, was sharing data with production through a replica.

A few hours later, customer support started getting emails. Users weren’t receiving payment notifications. Others had missing phone numbers in their profiles. That’s when they called us.

What went wrong

We traced the issue to the script. It did the basic extraction, but it made three fatal assumptions:

It didn’t handle NULL values or missing keys inside the JSON structure.
It inserted partial records without validation.
It used ON CONFLICT DO NOTHING, so any failed inserts were silently ignored.

Result: about 18% of the contact data was either lost or corrupted. No logs. No error messages. Just quiet data loss.

What it took to fix

We assigned a small team to untangle the mess. Here’s what we did:

Diagnosis and replication (4 hours) We recreated the script in a sandbox environment and ran it against a snapshot of the database. That’s how we confirmed the issue and mapped exactly what went missing.
Forensic data audit (8 hours) We compared the broken state with backups, identified all records with missing or partial data, and matched them against event logs to trace which inserts failed and why.
Rewriting the migration logic (12 hours) We rewrote the entire script in Python, added full validation logic, built a rollback mechanism, and integrated it into the client’s CI pipeline. This time, it included tests and dry-run support.
Manual data recovery (6 hours) Some records were unrecoverable from backups. We pulled missing fields from external systems (their CRM and email provider APIs) and manually restored the rest.

Total time: 30 engineering hours

Two engineers, three days. Cost to client: around $4,500 in service fees.

But the bigger hit came from customer fallout. Failed notifications led to missed payments and churn. The client told us they spent at least $10,000 on support tickets, SLA compensation, and goodwill credits over that one botched script.

The ironic thing is that a senior developer could’ve written the correct migration in maybe four hours. But the promise of AI speed ended up costing them two weeks of cleanup and reputation damage.

We fix what ChatGPT broke - and build what it couldn’t.

Case 2: The API client that ignored rate limits and broke production

This one came from a legal tech startup building a document management platform for law firms. One of their core features was integrating with a government e-notification service — a third-party REST API with OAuth 2.0 and strict rate limiting: 50 requests per minute, no exceptions.

Instead of assigning the integration to an experienced backend dev, someone on the team decided to “prototype it” using ChatGPT. They dropped in the OpenAPI spec, asked for a Python client, and got a clean-looking script with requests, retry logic using tenacity, and token refresh.

Looked solid on paper. So they shipped it.

What went wrong

At first, everything seemed fine. The client handled individual requests correctly, passed authentication, and even retried on failure. But during real-world usage, especially under load, the platform started behaving unpredictably.

Here’s what actually happened:

No respect for rate limits. The generated code didn’t read or interpret X-RateLimit-Remaining or Retry-After headers. It just kept sending requests blindly.
Retries made things worse. When 429 errors started coming back, the tenacity decorator retried them automatically. No jitter. No queueing. Just a flood of follow-up requests.
The API provider temporarily blocked their IP. For 3 hours, no one on the platform could sync documents. No logs, no alerts. Just quiet failure.

This wasn’t a one-line fix. It was a misunderstanding of how production systems behave. And it’s a great example of what LLMs don’t know; not because they’re broken, but because they don’t have runtime awareness.

Stop patching AI-generated code in prod - bring us in before it breaks.

How we fixed it

Trace and isolate the failure (6 hours) We added middleware to inspect outbound traffic and confirmed the flood of requests during peak usage. We also recreated the failure in staging to fully understand the triggering pattern.
Rebuild the API client (10 hours) We rewrote the client using httpx.AsyncClient, implemented a semaphore-based throttle, added exponential backoff with jitter, and properly handled Retry-After and rate-limit headers.
Stress test and validation (6 hours) We simulated real-world usage with thousands of concurrent requests using Locust, tested rate throttling under different burst scenarios, and confirmed zero 429s under sustained load.
Add monitoring and alerting (4 hours) We set up custom Prometheus metrics to track API usage per minute, and added alerts to notify the team if they were nearing rate thresholds.

Total time: 26 hours

Two engineers, spread over two and a half days. Cost to client: around $3,900.

The bigger problem is that their largest customer — a law firm with time-sensitive filings — missed two court submission windows due to the outage. The client had to do damage control and offer a discount to keep the account.

All because a language model didn’t understand the difference between “working code” and “production-ready code.” And just like that, another layer of AI technical debt was quietly added to the stack.

Why this keeps happening

The scary part isn’t that these things go wrong. It’s how predictable it’s all becoming.

Every one of these incidents follows the same pattern. A developer asks ChatGPT for a code snippet. It returns something that works just well enough not to throw errors. They wire it into the system, maybe clean it up a little, and ship it, assuming that if it compiles and runs, it must be safe.

But here’s the catch: Large language models don’t know your system.
They don’t know how your services interact.
They don’t know your latency budget, your deployment pipeline, your observability setup, or your production traffic patterns.

They generate the most likely-looking code based on patterns in their training data. That’s all. There’s no awareness. No guarantees. No intuition for system design.

And the output often reflects that:

Code that works once, but fails under load
No defensive programming, no fail-safes
Poor understanding of real-world constraints like rate limits, timeouts, or eventual consistency
Absolutely zero sense of architectural intent

What’s worse is that the code looks correct. It’s syntactically clean. It passes linters. It might even be covered by a basic test. But it’s missing the one thing that actually matters: context.

That’s why these bugs don’t show up right away. They wait for Friday night deployments, for high-traffic windows, for rare edge cases. That’s the nature of AI technical debt – it’s invisible until it breaks something critical.

When AI actually helps

As we mentioned earlier, we use AI too. Pretty much every engineer on our team has a Copilot-like setup running locally. It’s fast, helpful, and honestly, a great way to skip the boring parts.

But here’s the difference: nothing makes it into the main branch without going through a senior engineer, and in most cases, a CI pipeline that knows what to look for.

LLMs are great at:

scaffolding boilerplate for new services or API endpoints,
generating test coverage for existing logic,
helping with repetitive refactoring across large codebases,
translating simple shell scripts into infrastructure-as-code templates,
or even comparing algorithmic approaches we already understand.

What they’re not good at is design. Or context. Or safe defaults.

That’s why we’ve built our workflows to treat LLM output like suggestions, not source of truth. Here’s what that looks like in practice:

We tag all AI-generated commits so they’re easy to trace and review.
Our editors have inline prompts, but with enforced pre-commit hooks that block anything without tests or documentation.
Our CI includes static analysis rules that flag unsafe patterns we’ve seen before from LLMs: things like unguarded retries, unscoped timeouts, naive JSON parsing, or unsafe SQL handling.
Every pull request with LLM-generated code goes through a mandatory human review, usually by someone senior who understands the domain logic and risk surface.

Used right, it’s a time-saver. Used blindly, it’s a time bomb.

What we recommend to CTOs

We’re not here to tell you to ban AI tools. That ship has sailed.

But giving a language model commit access? That’s just asking for trouble.

Here’s what we recommend instead:

1. Treat LLMs like tools, not engineers

Let them help with repetitive code. Let them propose solutions. But don’t trust them with critical decisions. Any code generated by AI should be reviewed by a senior engineer, no exceptions.

2. Make LLM-generated code traceable

Whether it’s commit tags, metadata, or comments in the code, make it clear which parts came from AI. That makes it easier to audit, debug, and understand the risk profile later on.

3. Define a generation policy

Decide as a team where it’s acceptable to use LLMs and where it’s not. Boilerplate? Sure. Auth flows? Maybe. Transactional systems? Absolutely not without review. Make the policy explicit and part of your engineering standards.

4. Add DevOps-level monitoring

If you’re letting AI-generated code touch production, you need to assume something will eventually break. Add synthetic checks. Rate-limit monitors. Dependency tracking. Make the invisible visible, especially when the original author isn’t human.

5. Build for recoverability

The biggest AI-driven failures we’ve seen didn’t come from “bad” code. They came from silent errors — missing data, broken queues, retry storms — that went undetected for hours. Invest in observability, fallback logic, and rollbacks. Especially if you’re letting ChatGPT write migrations.

In short, AI can save your team time, but it can’t take responsibility.

That’s still a human job.

Closing thoughts: AI ≠ software engineers

AI can help you move faster. But it can’t think for you.

It doesn’t understand your architecture. It doesn’t know what “done” means in your context. And it definitely doesn’t care if your data pipeline silently breaks on a Friday night.

That’s why, as CTOs, we need to stay focused on system resilience, not just speed.

It’s tempting to let AI handle the boring parts. And sometimes that’s fine. But every shortcut comes with a tradeoff. When AI-generated code slips through unchecked, it often becomes AI technical debt. The kind you don’t see until your ops team is firefighting in production.

If you’ve already run into that wall, you’re not alone. We’ve helped teams recover from everything from broken migrations to API disasters. We don’t just refactor code. We help refactor the thinking behind it.

Because in the end, that’s what actually makes systems reliable.

Philip Tikhanovich

Head of Python, Big Data, ML/DS/AI Department

Philip brings sharp focus to all things data and AI. He’s the one who asks the right questions early, sets a strong technical vision, and makes sure we’re not just building smart systems — we’re building the right ones, for real business value.

Contact us

Book a call or fill out the form below and we’ll get back to you once we’ve processed your request.

Name

Company

Phone

Message

Send us a voice message

Attach documents

Upload file

You can attach 1 file up to 2MB. Valid file formats: pdf, jpg, jpeg, png.

By clicking Send, you consent to Innowise processing your personal data per our Privacy Policy to provide you with relevant information. By submitting your phone number, you agree that we may contact you via voice calls, SMS, and messaging apps. Calling, message, and data rates may apply.

You can also send us your request
to contact@innowise.com

What happens next?

Once we’ve received and processed your request, we’ll get back to you to detail your project needs and sign an NDA to ensure confidentiality.

After examining your wants, needs, and expectations, our team will devise a project proposal with the scope of work, team size, time, and cost estimates.

We’ll arrange a meeting with you to discuss the offer and nail down the details.

Finally, we’ll sign a contract and start working on your project right away.