It's like being Rip Van Winkle and getting slapped a lot.
Great analogies. You might want to re-examine your assumption that inference is expensive. Quantized LLMs work just fine on consumer hardware now. There is no need for custom hardware or lots of RAM. I expect OpenAI's costs are orders of magnitude below what they are charging. (If they are not now then they soon will be.)