The IEEE BigData 2024@suicide detection on social media competition aims to leverage advanced natural language processing and machine learning techniques to detect suicide risk on social media. The core objective of this challenge is to develop models capable of accurately classifying the level of suicide risk in social media posts. The competition provided a dataset containing 500 labeled samples and 1500 unlabeled samples, requiring participants to address issues such as data scarcity and class imbalance while exploring the effectiveness of various model architectures.
Model Type | Number of Teams | Main Models |
---|---|---|
Base Language Models | 11 | BERT, RoBERTa, DeBERTa, mental-longformer, SVM… |
Large Language Models | 5 | Phi 3.5, Claude-3.5-Sonnet, bloomz-3b, Qwen2-72B-Instruct, Qwen2-max, Llama 3-8B, Llama 3.1-8B, Gemma 2-9B, Gemma 2-27B, GPT-4, GPT-4o, GPT-4-mini, GPT-4-turbo |
Category | Model |
---|---|
Prompt engineering (in-context learning) | Qwen2-72B-Instruct, Qwen2-max, Claude-3.5-Sonnet, GPT-4, GPT-4o, GPT-4-mini, GPT-4-turbo |
Finetune | Llama 3-8B, Llama 3.1-8B, Gemma 2-9B, Gemma 2-27B, GPT-4o |
Large Language Models vs Base Language Models: Among large language models, open-source Llama is the most popular, followed by Gemma 2. Commercial models are dominated by GPT-4. Approaches based on base language models primarily use RoBERTa-based models. Based on the results submitted by several teams, large language models can achieve good performance with fewer improvements, suggesting a higher performance floor. However, due to limited improvement methods for large language models, combining large language models with base language models offers more room for innovation. Ultimately, regardless of the approach, achieving good results requires novel and well-designed improvements.
The IEEE BigData 2024@suicide detection on social media competition attracted research teams from around the world, showcasing the potential of natural language processing and machine learning in addressing critical social issues. Participating teams tackled challenges such as data scarcity and class imbalance through innovative methods, exploring the performance of both base language models and large language models in suicide risk detection tasks. Future competitions will continue to drive development in this field, focusing on innovations in multimodal, cross-lingual, and real-time intervention aspects.