Why You Need an “Heir and a Spare” for Your AI Toolkit
It’s not convenient, but it is necessary.
I was listening to a Wall Street Journal tech update podcast, and they were talking about the increasing cost of compute. We can’t build data centers fast enough, so the price of compute is going up. We also can’t get chips fast enough, so the machines to put into the data centers are costing more. And let’s not even talk about the cost of power.
It’s a domino effect—there’s not enough compute, but everyone wants more so costs go up. It’s the Jevons Paradox—things become easier, so we use more of it. We’re seeing the knock on effects of both capacity constraints and the increasing cost of compute with usage limits and caps hitting every AI user from free to paid. The solution?
Having an heir and a spare in your AI toolkit.
Free and paid plans are feeling the crunch
Lately when I’ve done AI training classes with people and used ChatGPT on a free plan, I would run out of usage really fast—a lot faster than I expected. The same thing was happening with Claude. I’m using the free plan most of the time, but things that I used to do that weren’t a big deal, and didn’t seem to use a lot of compute, are causing me to to hit usage limits pretty hard, pretty fast.
Even for people who are paying for Claude—those are the ones we hear about the most, and I’m sure it’s the same for ChatGPT—they have to adjust how they use it and what tasks they do. Christopher Penn wrote a whole piece on how to balance out your usage using local models versus your cloud, and how to pick the right model for the right task. Some of it hits a 12 on the geek-o-meter, but it’s an important thing to think about.
When you’re starting a task on Claude or ChatGPT it’s not business as usual anymore—you need to plan the what, when, and how to get your work done. Sometimes, that might mean switching tools, not just to match the tool to the job—like deep research or image generation in Gemini—but to just keep working period. I realized this morning that the smart way to use AI is to have what I’m calling the “Heir and Spare” model. The “Heir” is the one you’re going to use all the time: Claude, ChatGPT, Gemini, etc. The “Spare” is your backup for when your primary runs out, but you still have things to get done.
Obviously, this is not ideal because every model handles things a little differently. If you were doing a writing project in Claude, and you’ve trained it on your voice, it has memory, and the project knows all the things you want and how you express yourself, and then you have to pivot to ChatGPT or Gemini, the result is going to change. This is not ideal.
You have to kind of pick where you throw your “Spare” into the mix and what kind of tasks it’s best suited for. Maybe doing some data analysis or checking through a spreadsheet is fine for switching back and forth, but a big writing project—probably not.
How does Google Gemini play into all this?
What I find interesting, and this is a segue into talking about Google and Gemini, is when I use Antigravity on a free plan, and I haven’t had too many issues with hitting usage limits. I can switch between four different settings of models: fast, pro low, pro high for Gemini, and the two Claude Sonnet models (low and high). All the Gemini and the Sonnet models (low or high) count against the usage for that model just whether I use low versus high determines how fast that usage is burned. I fully admit I do not manage my usage well. I could probably be better by using fast or Haiku or any of the “less smart” models.
But that’s not quite where I’m going with this.
The most interesting thing I’m seeing is that I haven’t experienced too much disruption using Gemini hitting usage limits day-to-day. I pay for a Google Workspace Pro. It’s not the super ultra; it’s the one that’s about $22–$24 Canadian and gives me a lot of bang for my buck. I can’t remember the last time it said, sorry dude, that’s all for today.”
As a side note, I’m actually going to juggle things a bit soon to downgrade my Workspace account and upgrade my personal Google/Gmail account to Google One AI Pro because I’ll get even better bang for my buck because I’ll have better access to new tools not available to Workspace users and sharing 5TB of space with my family will alleviate the strain on my wife’s overburdened Google Drive.
Granted, on a free Gemini plan you do hit usage limits on “Thinking” mode somewhat fast. I taught an AI 101 training session with someone, and he wasn’t paying for any AI, so we started with Gemini free. We got a good way into the lesson before he ran out of “Thinking” usage. In hindsight, there are a couple of spots where we could have used “Fast” mode instead and it would have been fine. I next time I’d prioritize we use Thinking mode for deep research and maybe building a prompt (hindsight is always 20/20).
But I digress.
I think the reason why Google is so good at its usage limits is simply they have their own data centers. They’re not renting space like Anthropic and OpenAI are from Microsoft or Amazon or Google. Google has the servers. They have their own chips. While I’m sure they are having compute capacity issues, they seem to be managing it well.
This is why I think Google is going to have a long-term edge over Anthropic and OpenAI. Google and Microsoft are going to be the go-to AI tools for work. Both have solid enterprise and education install bases (clearly Microsoft leads the pack here). Both have their own data centers. Both have funds to throw at more data centers, more chips, more everything than either Anthropic or OpenAI do.
Don’t get me wrong, I think Claude and ChatGPT are amazing. They have changed how we work, but I think over the long term—especially until we resolve these compute issues—it’s Microsoft and Google that own the compute that can weather the storm, that can manage the costs better, and be more generous at the free tier. Getting people hooked on the free tier attracts more users because, “Oh, this is great on free, I can do a lot” quickly moves into “wow, that’s not a bad price to upgrade and get all this extra stuff to do even more” pretty easily. Case in point—me.
Semper gumby—Always flexible
The lesson here is this, until the compute problem is solved, you need to use the heir and the spare model for your AI tools. You have a primary one that you train, you build all your tools in, and you use primarily, but you have a backup, a backup that can bridge the gap that can be okay for a period of time until your usage limits reset on your primary tool.
Is this ideal?
No.
Is it going to be disruptive?
Probably.
Is it essential for how you use AI in the future, getting AI getting work done, and not being slowed down and not impact your productivity?
Absolutely.
What’s your setup? What’s your heir and your spare?