内容
隐藏
我们提供了各种序列建模论文的参考实现:
实施文件一览表
- 卷积神经网络(CNN)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- LightConv和DynamicConv模型
- 长短期记忆(LSTM)网络
- 基于注意力的神经机器翻译的有效方法(Luong等,2015)
- 变压器(自我注意)网络
- 你只需要关注(Vaswani等人,2017年)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)
- Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)
- Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- Facebook FAIR’s WMT19 News Translation Task Submission (Ng et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)
- Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- Deep Transformers with Latent Depth (Li et al., 2020)
- Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)
- Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)
- 非自回归变压器
- 非自回归神经机器翻译(Gu等,2017)
- 基于迭代精化的确定性非自回归神经序列建模(Lee等人2018年)
- 插入变压器:通过插入操作生成灵活的序列(Stern等人2019年)
- 掩码预测:条件掩码语言模型的并行解码(Ghazvininejad等人,2019年)
- Levenshtein Transformer (Gu et al., 2019)
- 微调
最新消息:
- 2021年7月Released DrNMT code
- 2021年7月Released Robust wav2vec 2.0 model
- 2021年6月Released XLMR-XL and XLMR-XXL models
- 2021年3月Added full parameter and optimizer state sharding + CPU offloading
- 2021年2月Added LASER training code
- 2020年12月:Added Adaptive Attention Span code
- 2020年12月:GottBERT model and code released
- 2020年11月:通过Hydra配置框架
- 2020年11月:fairseq 0.10.0 released
- 2020年10月:Added R3F/R4F (Better Fine-Tuning) code
- 2020年10月:Deep Transformer with Latent Depth code released
- 2020年10月:Added CRISS models and code
以前的更新
- 2020年9月:Added Linformer code
- 2020年9月:Added pointer-generator networks
- 2020年8月:Added lexically constrained decoding
- 2020年8月:wav2vec2 models and code released
- 2020年7月:Unsupervised Quality Estimation code released
- 2020年5月:Follow fairseq on Twitter
- 2020年4月:Monotonic Multihead Attention code released
- 2020年4月:Quant-Noise code released
- 2020年4月:Initial model parallel support and 11B parameters unidirectional LM released
- 2020年3月:Byte-level BPE code released
- 2020年2月:mBART model and code released
- 2020年2月:Added tutorial for back-translation
- 2019年12月:fairseq 0.9.0 released
- 2019年11月:VizSeq released (a visual analysis toolkit for evaluating fairseq models)
- 2019年11月:CamemBERT model and code released
- 2019年11月:BART model and code released
- 2019年11月:XLM-R models and code released
- 2019年9月:Nonautoregressive translation code released
- 2019年8月:WMT’19 models released
- 2019年7月:Fairseq根据麻省理工学院许可证重新发放
- 2019年7月:RoBERTa models and code released
- 2019年6月:wav2vec models and code released
功能:
- 在一台机器上或跨多台机器进行多GPU培训(数据和模型并行)
- 在CPU和GPU上实现多种搜索算法的快速生成:
- 波束搜索
- 分束搜索(Vijayakumar et al., 2016)
- 采样(无约束、top-k和top-p/核)
- lexically constrained decoding(Post&Vilar,2018年)
- gradient accumulation即使在单个GPU上也能进行大批量小批量培训
- mixed precision training(使用更少的GPU内存,培训速度更快NVIDIA tensor cores)
- extensible:轻松注册新模型、标准、任务、优化器和学习速率调度器
- flexible configuration基于Hydra允许组合使用基于代码、命令行和文件的配置
- full parameter and optimizer state sharding
- offloading parameters to CPU
我们还提供pre-trained models for translation and language modeling以一种方便的方式torch.hub
接口:
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
请参阅Pych Torch Hub教程,了解translation和RoBERTa有关更多示例,请参阅
要求和安装
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
# to install the latest stable release (0.10.x)
# pip install fairseq
- 提供更快的培训安装NVIDIAapex库:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
- 对于大型数据集安装PyArrow:
pip install pyarrow
- 如果您使用Docker,请确保使用以下命令增加共享内存大小
--ipc=host
或--shm-size
作为命令行选项执行以下操作nvidia-docker run
快速入门
这个full documentation包含有关入门、培训新模型以及使用新模型类型和任务扩展airseq的说明
预先训练的模型和示例
我们为下面列出的几个任务提供预先训练的模型和预处理的二进制测试集,以及示例培训和评估命令
- Translation:有卷积和变压器两种型号可供选择
- Language Modeling:有卷积和变压器两种型号可供选择
我们还提供了更详细的自述文件,以转载特定论文的结果:
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Facebook FAIR’s WMT19 News Translation Task Submission (Ng et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
加入FIRSEQ社区
- 推特:https://twitter.com/fairseq
- Facebook页面:https://www.facebook.com/groups/fairseq.users
- 谷歌集团:https://groups.google.com/forum/#!forum/fairseq-users
许可证
airseq(-py)是麻省理工学院授权的。许可证也适用于预先培训的模型
引文
请引述如下:
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}