--- license: apache-2.0 datasets: - google/docci - google/imageinwords - ProGamerGov/synthetic-dataset-1m-dalle3-high-quality-captions language: - en library_name: transformers pipeline_tag: image-text-to-text tags: - art base_model: gokaygokay/Florence-2-SD3-Captioner inference: false --- Original model is [here](https://huggingface.co/gokaygokay/Florence-2-SD3-Captioner). Tagger for local environment is [here](https://huggingface.co/John6666/local_gokaygokay_Florence-2-SD3-Captioner_Tagger). ```python # recipe from transformers import AutoModelForCausalLM, AutoProcessor, BitsAndBytesConfig import transformers import torch import json model_id = 'gokaygokay/Florence-2-SD3-Captioner' save_path = 'gokaygokay-Florence-2-SD3-Captioner-8bit' processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, trust_remote_code=True, torch_dtype=torch.float32, low_cpu_mem_usage=True, quantization_config=BitsAndBytesConfig( load_in_8bit=True, llm_int8_threshold=6.0, llm_int8_enable_fp32_cpu_offload=True, llm_int8_skip_modules=['lm_head'], ), ) processor.save_pretrained(save_path) model.save_pretrained(save_path, safe_serialization=True) config = {} with open(f'{save_path}/config.json') as f: config = json.load(f) config['vision_config']['model_type'] = 'davit' with open(f'{save_path}/config.json', 'w') as f: json.dump(config, f, indent=2) ```