--- license: apache-2.0 language: - zh tags: - bert - feature-extraction - text2vec datasets: - shibing624/nli_zh pipeline_tag: sentence-similarity --- 简介: 参考 https://github.com/shibing624/text2vec 基于Cosent模型架构,使用hfl/chinese-roberta-wwm-ext作为基座模型,在中文STS-B数据集上重新微调训练,将max_seq_length从原有的128扩展到了512 eval_spearman:0.833 --- 下游任务: 基于text2vec库或sentence-transformer库均可调用。 文本向量表征: ``` >>> from text2vec import SentenceModel, EncoderType >>> model = SentenceModel('EricLee/text2vec-roberta-512', encoder_type=EncoderType.FIRST_LAST_AVG, max_seq_length=512) >>> model.encode("今天天气不错啊") Embedding shape: (768,) ```