魔改Attention大集合
极市平台
共 10214字,需浏览 21分钟
·
2020-08-29 08:09
极市导读
如何对attention进行高效改进?本文盘点了相关论文,并梳理出它们的引用量、代码实现、算法复杂度和关键点,方便对比使用。
Efficient Attention
Paper (引用量) | 源码实现 | 复杂度 | AutoRegressive | Main Idea |
---|---|---|---|---|
Generating Wikipedia by Summarizing Long Sequences[1] (208) | memory-compressed-attention[2] | |||
CBAM: Convolutional Block Attention Module[3] (677) | attention-module[4] | |||
CCNet: Criss-Cross Attention for Semantic Segmentation[5] (149) | CCNet[6] | |||
Efficient Attention: Attention with Linear Complexities[7] (2) | efficient-attention[8] | |||
Star-Transformer[9] (24) | fastNLP[10] | |||
Generating Long Sequences with Sparse Transformers[11] (139) | torch-blocksparse[12] | |||
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond[13] (96) | GCNet[14] | |||
SCRAM: Spatially Coherent Randomized Attention Maps[15] (1) | - | |||
Interlaced Sparse Self-Attention for Semantic Segmentation[16] (13) | IN_PAPER | |||
Permutohedral Attention Module for Efficient Non-Local Neural Networks[17] (2) | Permutohedral_attention_module[18] | |||
Large Memory Layers with Product Keys[19] (28) | XLM[20] | |||
Expectation-Maximization Attention Networks for Semantic Segmentation[21] (38) | EMANet[22] | |||
Compressive Transformers for Long-Range Sequence Modelling[23] (20) | compressive-transformer-pytorch[24] | |||
BP-Transformer: Modelling Long-Range Context via Binary Partitioning[25] (8) | BPT[26] | |||
Axial Attention in Multidimensional Transformers[27] (5) | axial-attention[28] | |||
Reformer: The Efficient Transformer[29] (69) | trax[30] | |||
Transformer on a Diet[31] (2) | transformer-on-diet[32] | |||
Sparse Sinkhorn Attention[33] (4) | sinkhorn-transformer[34] | |||
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection[35] (1) | - | |||
Efficient Content-Based Sparse Attention with Routing Transformers[36] (11) | routing-transformer[37] | |||
Longformer: The Long-Document Transformer[38] (15) | longformer[39] | |||
Neural Architecture Search for Lightweight Non-Local Networks[40] (2) | AutoNL[41] | |||
ETC: Encoding Long and Structured Data in Transformers[42] (2) | - | |||
Multi-scale Transformer Language Models[43] (1) | IN_PAPER | |||
Synthesizer: Rethinking Self-Attention in Transformer Models[44] (5) | - | |||
Jukebox: A Generative Model for Music[45] (9) | jukebox[46] | |||
GMAT: Global Memory Augmentation for Transformers[47] (0) | gmat[48] | |||
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers[49] (0) | google-research[50] | |||
Hand-crafted Attention is All You Need? A Study of Attention on Self-supervised Audio Transformer[51] (0) | - | |||
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention[52] (1) | fast-transformers[53] | |||
Linformer: Self-Attention with Linear Complexity[54] (3) | linformer-pytorch[55] | |||
Real-time Semantic Segmentation with Fast Attention[56] (0) | - | |||
Fast Transformers with Clustered Attention[57] (0) | fast-transformers[58] | |||
Big Bird: Transformers for Longer Sequences[59] (0) | - |
推荐阅读
评论
音视频岗位大集合
音视频流媒体专家
岗位职责:
1、负责webrtc核心代码架构设计、开发和QoS调优;
2、主导弱网下的音视频流媒体体验优化;
3、跟进webrtc及业内流媒体最新技术更新;
任职要求:
1、 计算机、电子、通信、自动化相关专业本科或本科以上学历;211,985高校毕业优先。
2、 5年以上音视频、流媒体开发及架构经验。
3、熟悉H264,H265,AAC编码。
4、深入理解webrtc原理机制,精通webrtc开源代码,具备webrtc底层代码级修改和调优经验。
5、熟悉webrtc提升QoS的各种算法和原理,对流媒体网络传输时延、音视频质量有丰富调 优经验。
6、熟悉Android系统框架、JNI开发,ND
Dinah
0