How to Train model with AutoModelForSequenceClassification?

#20
by jerfie - opened

Hello! I want to train phi2 with AutoModelForSequenceClassification. But there is error like following code.

base_model = AutoModelForSequenceClassification.from_pretrained(
    "microsoft/phi-2", 
    num_labels=2,
    device_map={"":0},
    trust_remote_code=True,
)
base_model.config.pretraining_tp = 1 # 1 is 7b
base_model.config.pad_token_id = tokenizer.pad_token_id
# Output
ValueError: Unrecognized configuration class <class 'transformers_modules.phi-2.configuration_phi.PhiConfig'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of AlbertConfig, BartConfig, BertConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BloomConfig, CamembertConfig, CanineConfig, LlamaConfig, ConvBertConfig, CTRLConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FlaubertConfig, FNetConfig, FunnelConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTJConfig, IBertConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LongformerConfig, LukeConfig, MarkupLMConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MobileBertConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MvpConfig, NezhaConfig, NystromformerConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PerceiverConfig, PersimmonConfig, PhiConfig, PLBartConfig, QDQBertConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, SqueezeBertConfig, T5Config, TapasConfig, TransfoXLConfig, UMT5Config, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.

So i tried train phi2 with PhiForSequenceClassification. Use with following code.

base_model = PhiForSequenceClassification.from_pretrained(
    "microsoft/phi-2", 
    num_labels=2,
    device_map={"":0},
    trust_remote_code=True,
)
base_model.config.pretraining_tp = 1 # 1 is 7b
base_model.config.pad_token_id = tokenizer.pad_token_id

But the model output is nan. and then eval loss na too.

# Output
{'eval_loss': nan, 'eval_accuracy': 0.3618421052631579, 'eval_roc_auc': 0.5, 'eval_runtime': 19.2035, 'eval_samples_per_second': 39.576, 'eval_steps_per_second': 19.788, 'epoch': 0.03}                                                        
{'eval_loss': nan, 'eval_accuracy': 0.3618421052631579, 'eval_roc_auc': 0.5, 'eval_runtime': 19.2699, 'eval_samples_per_second': 39.44, 'eval_steps_per_second': 19.72, 'epoch': 0.04}       

I tried change TrainingArguments about fp16 argument and model arguments about torch_dtype. But It still same problem.
Can you give me some advice for training PhiForSequenceClassification?
Thank you :)

# Trial 1
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    learning_rate=1e-6,
    ...
    fp16=True
    # bf16=True,
)
# Trial 2
base_model = PhiForSequenceClassification.from_pretrained(
    "microsoft/phi-2", 
    num_labels=2,
    device_map={"":0},
    trust_remote_code=True,
    torch_dtype=torch.float16
)

I also hope that you will add support for AutoModelForSequenceClassification with phi2.

Great! Is there a PR in transformers to fix this?

I have refactored some code from SequenceClassification model for Phi 1.5 to work for Phi 2 :

https://colab.research.google.com/drive/1y_CFog1i97Ctwre41kUnKuTGFWgzGWte?usp=sharing

@jerfie @Asaf-Yehudai @hendrydong

Microsoft org

Hello everyone!

This will be fixed once we integrate Phi-based repositories with HF codebase. It will have support for PhiForSequenceClassification.

Best regards,
Gustavo.

gugarosa changed discussion status to closed

Sign up or log in to comment