nielsr's picture
nielsr HF staff
Add library_name
bf429ea verified
|
raw
history blame
13.3 kB
metadata
license: other
extra_gated_prompt: >-
  ### MULTI-TOKEN PREDICTION RESEARCH LICENSE AGREEMENT 18th June 2024

  This Multi-token Prediction Research License (“Agreement”) contains the terms
  and conditions that govern your access and use of the Materials (as defined
  below). You may not use the Materials if you do not accept this Agreement.  By
  clicking "submit" below to accept, or accessing, using, or distributing any
  portion or element of the Materials you hereby agree to be bound by the terms
  of this Agreement.  If you are agreeing to be bound by the Agreement on behalf
  of your employer or other entity, you represent and warrant to Meta Platforms
  Ireland Limited (if you are located in or, if you are an entity, your
  principal place of business is in the EEA or Switzerland) and Meta Platforms,
  Inc. (if you are located outside of the EEA or Switzerland) (“Meta”) that you
  have full legal authority to bind your employer or such entity to this
  Agreement.  If you do not have requisite authority, you may not accept the
  Agreement or access the Materials on behalf of your employer or other entity.

  This Agreement is effective upon the earlier of the date that you first access
  the Materials or accept this Agreement (“Effective Date”), and is entered into
  by and between Meta, and you, or if you are entering into this Agreement on
  behalf of your employer or other entity (if you are entering into this
  Agreement on such person or entity’s behalf), of the age required under
  applicable laws, rules, or regulations to provide legal consent and, your
  employer or other entity and that has legal authority to bind your employer or
  such other person or entity if you are entering in this Agreement on their
  behalf (“Licensee” or “You”).

  1. Definitions.

    a. “Documentation” means the specifications, manuals and documentation accompanying this release distributed by Meta at https://huggingface.co/facebook/multi-token-prediction.

    b. “Noncommercial Research Uses” means noncommercial research use cases related to research, development, education, processing, or analysis and in each case, is not primarily intended for commercial advantage or monetary compensation to you or others.

    c. “Materials” means, collectively, Documentation and the models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code, demonstration materials and other elements of the foregoing distributed by Meta at https://huggingface.co/facebook/multi-token-prediction and made available under this Agreement.

    d. “Trade Control Laws” means any applicable U.S. and non-U.S. export control and trade sanctions laws and regulations.

    e. “Acceptable Use Policy” means the [LLaMA Acceptable Use Policy](https://ai.meta.com/llama/use-policy/) applicable to Materials that is incorporated into this Agreement.

  2. License Rights and Redistribution. Subject to Your compliance with the
  terms and conditions of this Agreement, Meta hereby grants you the following:

    a. Grant of Rights. You are hereby granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Materials solely for Noncommercial Research Uses.

    b. Redistribution and Use.

      i. Distribution of Materials, and any derivative works thereof, are subject to the terms of this Agreement. If you distribute or make the Materials, or any derivative works thereof, available to a third party, you may only do so under the terms of this Agreement. You shall also provide a copy of this Agreement to such third party.

      ii. If you submit for publication the results of research you perform on, using, or otherwise in connection with Materials, you must acknowledge the use of Materials in your publication.

      iii. You must retain in all copies of the Materials that you distribute and include the following attribution notice within a “Notice” text file distributed as a part of such copies: “Materials are licensed under the Multi-token Prediction Research License, Copyright © Meta Platforms, Inc. All Rights Reserved.”

      iv. Your use of the Materials must comply with applicable laws and regulations (including Trade Control Laws) and adhere to the LLaMA Acceptable Use Policy, which is hereby incorporated by reference into this Agreement.

      v. You agree to validate and confirm LLaMA outputs for compliance with the LLaMA Acceptable Use Policy, including before relying on LLaMA outputs in any way as part of research activities or incorporating these outputs in research, studies, and papers.

      vi. You agree to report any violation of this Multi-token Prediction Research License or the Acceptable Use Policy, as outlined in the LLaMA Acceptable Use Policy.

  3. Restrictions. You will not, and will not permit, assist or cause any third
  party to:

    a. use the Materials or any outputs or results of the Materials in connection with any commercial uses or for any uses other than Noncommercial Research Uses;

    b. disguise your or their location through IP proxying or other methods;
    
    c. use or download Materials if you or they are: (a) located in a comprehensively sanctioned jurisdiction, (b) currently listed on any U.S. or non-U.S. restricted parties list, or (c) will use Materials for any purpose prohibited by Trade Control Laws; or

    d. directly or indirectly export, re-export, provide, or otherwise transfer Materials: (a) to any individual, entity, or country prohibited by Trade Control Laws; (b) to anyone on U.S. or non-U.S. government restricted parties lists; or (c) for any purpose prohibited by Trade Control Laws, including nuclear, chemical or biological weapons, or missile technology applications.

  4. User Support. Your Noncommercial Research Use of the Materials is done at
  your own discretion; Meta does not process any information nor provide any
  service in relation to such use.  Meta is under no obligation to provide any
  support services for the Materials. Any support provided is “as is”, “with all
  faults”, and without warranty of any kind.

  5. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE MATERIALS
  AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT
  WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, WITHOUT
  LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY,
  FITNESS FOR A PARTICULAR PURPOSE, THE ABSENCE OF LATENT OR OTHER DEFECTS,
  ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE.
  YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING OR
  REDISTRIBUTING THE MATERIALS AND ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF
  THE MATERIALS AND ANY OUTPUT AND RESULTS.

  6. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE
  UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS
  LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS
  OR ANY DIRECT OR INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR
  PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE
  POSSIBILITY OF ANY OF THE FOREGOING.

  7. Intellectual Property.

    a. No trademark licenses are granted under this Agreement, and in connection with the Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Materials.

    b. Subject to Meta’s ownership of Materials and derivatives made by or for Meta, with respect to any derivative works and modifications of the Materials that are made by you, as between you and Meta, you are and will be the owner of such derivative works and modifications.

    c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Materials or outputs or results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable by you, then any licenses and rights  granted to you under this Agreement shall terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third party arising out of or related to your use or distribution of the Materials.

  8. Term and Termination. The term of this Agreement will commence upon your
  acceptance of this Agreement or access to the Materials and will continue in
  full force and effect until terminated in accordance with the terms and
  conditions herein. Meta may terminate this Agreement if you are in breach of
  any term or condition of this Agreement. Upon termination of this Agreement,
  you shall delete and cease use of the Materials. Sections 3, 4, 5, 6,  7,  8
  and 9 shall survive the termination of this Agreement.

  9. Governing Law and Jurisdiction. This Agreement will be governed and
  construed under the laws of the State of California without regard to choice
  of law principles, and the UN Convention on Contracts for the International
  Sale of Goods does not apply to this Agreement. The courts of California shall
  have exclusive jurisdiction of any dispute arising out of this Agreement.

  10. Modifications and Amendments. Meta may modify this Agreement from time to
  time by posting a revised version at
  https://huggingface.co/facebook/multi-token-prediction/LICENSE; provided that
  they are similar in spirit to the current version of the Agreement, but may
  differ in detail to address new problems or concerns. All such changes will be
  effective immediately. Your continued use of the Materials after any
  modification to this Agreement constitutes your agreement to such
  modification. Except as provided in this Agreement, no other modification or
  addition to any provision of this Agreement will be binding unless it is in
  writing and signed by an authorized representative of both you and Meta.
extra_gated_fields:
  First Name: text
  Last Name: text
  Date of birth: date_picker
  Country: country
  Affiliation: text
  geo: ip_location
  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the [Meta Privacy
  Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
library_name: multi_token_prediction

Multi-token prediction models and baselines

Models accompanying the research paper "Better & Faster Large Language Models via Multi-token Prediction" (https://arxiv.org/abs/2404.19737).

Included are the following four 7B parameter models trained on code:

  • baseline model (n=1) trained on 200B tokens of code: 7B_200B_1/
  • multi-token prediction model (n=4) trained on 200B tokens of code: 7B_200B_4/
  • baseline model (n=1) trained on 1T tokens of code: 7B_1T_1/
  • multi-token prediction model (n=4) trained on 1T tokens of code: 7B_1T_4/

Tokenizer: standard Llama 2 SentencePiece tokenizer in tokenizer.model.

Quickstart

Install torch, fairscale, fire and sentencepiece and run

torchrun --nproc_per_node 1 example_completion.py --ckpt_dir 7B_200B_4/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 2

replacing 7B_200B_4 by the respective checkpoint directory.

Format

The Pytorch state_dicts are compatible with Llama format: the layers of the shared trunk and the next-token prediction head layer are numbered contiguously. Additional prediction heads for tokens further in the future are names extra_heads and can be ignored for standard autoregressive inference.

The implementation of forward() in llama/model.py provides an additional argument return_all_heads. If set, the additional prediction heads are called and the logits are returned in shape (batch_size, seq_len, n_future_tokens, vocab_size). Otherwise, the logit's shape is (batch_size, seq_len, 1, vocab_size).

Citation

Gloeckle, F., Idrissi, B. Y., Rozière, B., Lopez-Paz, D., & Synnaeve, G. (2024). Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737.

Bibtex entry:

@article{gloeckle2024better,
  title={Better \& faster large language models via multi-token prediction},
  author={Gloeckle, Fabian and Idrissi, Badr Youbi and Rozi{\`e}re, Baptiste and Lopez-Paz, David and Synnaeve, Gabriel},
  journal={arXiv preprint arXiv:2404.19737},
  year={2024}
}

Feedback and comments

Please report risks as indicated in the Acceptable Use Policy and address bugs and any other comments to the corresponding authors as indicated in the research paper.