I'm just thinking out loud here, but...

If the token window is limited, and we want to have a larger input + output than the window allows...

Could we use a sort of rolling "convolution" over the input in order to get an output that is larger than the window might allow?


Window: 8k tokens

Input: 20k tokens


Parse the first 4k tokens of input, generate 2k tokens of output

Append 2k output to the next 4k tokens of input, generate 2k tokens of output

Repeat until the input has been fully parsed, or continue until the window is full with output tokens.

Expand full comment

Your technical understanding of the AI world is really impressive. Been reading your work for a few months, thought I must share my appreciation for your tech chops. ๐Ÿ™

Expand full comment

You might be able to use GPT-4 to do the context compression to produce context tokens to use in the future (summarize your impression in n words), just like LLMs can generate their own instruction training data to be trained on.

Expand full comment

Regarding GPT-4 as a noise filter: I think it's too creative and subjective to do a proper job.

I successfully used it to clear up - random parts - of a hundred year old print of Hungarian drama, that was OCR'd. There the grammatical structure is very strict, leaves little room for creativity (it's also possible that GPT-4 "understands" this as a strict, archivist's task, unlike podcast transcript editing). Is an increase in the token window the solution to pull of a job, that a human should be able to perform if they're given a part of the job without context?

It's probably better to have a simpler transformer trained for this kind of cleanup, like DeepL.

Expand full comment