- 卷积神经网络(CNN)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- LightConv和DynamicConv模型
- 长短期记忆(LSTM)网络
- 基于注意力的神经机器翻译的有效方法(Luong等,2015)
- 变压器(自我注意)网络
- 你只需要关注(Vaswani等人,2017年)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)
- Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)
- Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- Facebook FAIR’s WMT19 News Translation Task Submission (Ng et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)
- Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- Deep Transformers with Latent Depth (Li et al., 2020)
- Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)
- Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)
- 非自回归变压器
- 非自回归神经机器翻译(Gu等,2017)
- 基于迭代精化的确定性非自回归神经序列建模(Lee等人2018年)
- 插入变压器:通过插入操作生成灵活的序列(Stern等人2019年)
- 掩码预测:条件掩码语言模型的并行解码(Ghazvininejad等人,2019年)
- Levenshtein Transformer (Gu et al., 2019)
- 微调
- 2021年7月Released DrNMT code
- 2021年7月Released Robust wav2vec 2.0 model
- 2021年6月Released XLMR-XL and XLMR-XXL models
- 2021年3月Added full parameter and optimizer state sharding + CPU offloading
- 2021年2月Added LASER training code
- 2020年12月:Added Adaptive Attention Span code
- 2020年12月:GottBERT model and code released
- 2020年11月:通过Hydra配置框架
- 2020年11月:fairseq 0.10.0 released
- 2020年10月:Added R3F/R4F (Better Fine-Tuning) code
- 2020年10月:Deep Transformer with Latent Depth code released
- 2020年10月:Added CRISS models and code
- 2020年9月:Added Linformer code
- 2020年9月:Added pointer-generator networks
- 2020年8月:Added lexically constrained decoding
- 2020年8月:wav2vec2 models and code released
- 2020年7月:Unsupervised Quality Estimation code released
- 2020年5月:Follow fairseq on Twitter
- 2020年4月:Monotonic Multihead Attention code released
- 2020年4月:Quant-Noise code released
- 2020年4月:Initial model parallel support and 11B parameters unidirectional LM released
- 2020年3月:Byte-level BPE code released
- 2020年2月:mBART model and code released
- 2020年2月:Added tutorial for back-translation
- 2019年12月:fairseq 0.9.0 released
- 2019年11月:VizSeq released (a visual analysis toolkit for evaluating fairseq models)
- 2019年11月:CamemBERT model and code released
- 2019年11月:BART model and code released
- 2019年11月:XLM-R models and code released
- 2019年9月:Nonautoregressive translation code released
- 2019年8月:WMT’19 models released
- 2019年7月:Fairseq根据麻省理工学院许可证重新发放
- 2019年7月:RoBERTa models and code released
- 2019年6月:wav2vec models and code released
- 在一台机器上或跨多台机器进行多GPU培训(数据和模型并行)
- 在CPU和GPU上实现多种搜索算法的快速生成:
- 波束搜索
- 分束搜索(Vijayakumar et al., 2016)
- 采样(无约束、top-k和top-p/核)
- lexically constrained decoding(Post&Vilar,2018年)
- gradient accumulation即使在单个GPU上也能进行大批量小批量培训
- mixed precision training(使用更少的GPU内存,培训速度更快NVIDIA tensor cores)
- extensible:轻松注册新模型、标准、任务、优化器和学习速率调度器
- flexible configuration基于Hydra允许组合使用基于代码、命令行和文件的配置
- full parameter and optimizer state sharding
- offloading parameters to CPU
我们还提供pre-trained models for translation and language modeling以一种方便的方式torch.hub
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
请参阅Pych Torch Hub教程,了解translation和RoBERTa有关更多示例,请参阅
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
# to install the latest stable release (0.10.x)
# pip install fairseq
- 提供更快的培训安装NVIDIAapex库:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
- 对于大型数据集安装PyArrow:
pip install pyarrow
- 如果您使用Docker,请确保使用以下命令增加共享内存大小
作为命令行选项执行以下操作nvidia-docker run
这个full documentation包含有关入门、培训新模型以及使用新模型类型和任务扩展airseq的说明
- Translation:有卷积和变压器两种型号可供选择
- Language Modeling:有卷积和变压器两种型号可供选择
