LLM_architectures
updated
Nemotron-4 15B Technical Report
Paper
• 2402.16819
• Published
• 46
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
RWKV: Reinventing RNNs for the Transformer Era
Paper
• 2305.13048
• Published
• 21
Reformer: The Efficient Transformer
Paper
• 2001.04451
• Published
• 2
Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published
• 16
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Paper
• 2112.06905
• Published
• 2
UL2: Unifying Language Learning Paradigms
Paper
• 2205.05131
• Published
• 5
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published
• 37
The Flan Collection: Designing Data and Methods for Effective
Instruction Tuning
Paper
• 2301.13688
• Published
• 9
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
Textbooks Are All You Need
Paper
• 2306.11644
• Published
• 154
Paper
• 2310.06825
• Published
• 58
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published
• 49
Paper
• 2401.04088
• Published
• 160
The Falcon Series of Open Language Models
Paper
• 2311.16867
• Published
• 14
Gemma: Open Models Based on Gemini Research and Technology
Paper
• 2403.08295
• Published
• 50
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
ReALM: Reference Resolution As Language Modeling
Paper
• 2403.20329
• Published
• 22
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published
• 40
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published
• 48
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
You Only Cache Once: Decoder-Decoder Architectures for Language Models
Paper
• 2405.05254
• Published
• 10
TransformerFAM: Feedback attention is working memory
Paper
• 2404.09173
• Published
• 43
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
Comprehensive Study to Low Rank Compensation
Paper
• 2303.08302
• Published
Kolmogorov-Arnold Transformer
Paper
• 2409.10594
• Published
• 45
Fast Inference from Transformers via Speculative Decoding
Paper
• 2211.17192
• Published
• 10
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152