This multi-tiered approach allows the model to handle sequences over 2 million tokens in length, far beyond what current transformers can process efficiently. Image: Google According to the research ...