--- library_name: peft base_model: Qwen/Qwen2-1.5B-Instruct pipeline_tag: text-generation license: apache-2.0 --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by: hack337** - **Model type: qwen2** - **Finetuned from model: Qwen/Qwen2-1.5B-Instruct** ### Model Sources [optional] - **Repository: https://huggingface.co/Hack337/WavGPT-1.0** - **Demo: https://huggingface.co/spaces/Hack337/WavGPT** ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( "Hack337/WavGPT-1.0-merged", torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged") prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "Вы очень полезный помощник."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` Use the code below to get started with the model using NPU. ```python from transformers import AutoTokenizer, TextStreamer from intel_npu_acceleration_library import NPUModelForCausalLM import torch # Load the NPU-optimized model without LoRA model = NPUModelForCausalLM.from_pretrained( "Hack337/WavGPT-1.0-merged", use_cache=True, dtype=torch.float16 # Use float16 for the NPU ).eval() # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-1.0-merged") tokenizer.pad_token_id = tokenizer.eos_token_id streamer = TextStreamer(tokenizer, skip_special_tokens=True) # Prompt handling prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "Вы очень полезный помощник."}, {"role": "user", "content": prompt} ] # Convert to a text format compatible with the model text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu") # Generation configuration generation_kwargs = dict( input_ids=prefix, streamer=streamer, do_sample=True, top_k=50, top_p=0.9, max_new_tokens=512, ) # Run inference on the NPU print("Run inference") _ = model.generate(**generation_kwargs) ``` - PEFT 0.11.1