IEEE BigData 2024@Detection of Suicide Risk on Social Media Competition Summary

1. Competition Background

The IEEE BigData 2024@suicide detection on social media competition aims to leverage advanced natural language processing and machine learning techniques to detect suicide risk on social media. The core objective of this challenge is to develop models capable of accurately classifying the level of suicide risk in social media posts. The competition provided a dataset containing 500 labeled samples and 1500 unlabeled samples, requiring participants to address issues such as data scarcity and class imbalance while exploring the effectiveness of various model architectures.

2. Participation Overview

Number of registered teams: 47
Number of teams participating in the leaderboard: 21
Number of teams participating in the final evaluation: 13
Number of teams invited to submit papers: 10

3. Summary of Submitted Papers

a) Methods for utilizing limited labeled data (500 samples) and unlabeled data (1500 samples):

Semi-supervised learning methods (pseudo-labels)
Manual annotation
Data augmentation

b) Strategies for addressing dataset label imbalance:

Oversampling/undersampling techniques
Class weight adjustment
Custom loss function design

c) Model selection and optimization:

Model Type	Number of Teams	Main Models
Base Language Models	11	BERT, RoBERTa, DeBERTa, mental-longformer, SVM…
Large Language Models	5	Phi 3.5, Claude-3.5-Sonnet, bloomz-3b, Qwen2-72B-Instruct, Qwen2-max, Llama 3-8B, Llama 3.1-8B, Gemma 2-9B, Gemma 2-27B, GPT-4, GPT-4o, GPT-4-mini, GPT-4-turbo

LLMs Used:

Category	Model
Prompt engineering (in-context learning)	Qwen2-72B-Instruct, Qwen2-max, Claude-3.5-Sonnet, GPT-4, GPT-4o, GPT-4-mini, GPT-4-turbo
Finetune	Llama 3-8B, Llama 3.1-8B, Gemma 2-9B, Gemma 2-27B, GPT-4o

Preliminary Conclusions:

Large Language Models vs Base Language Models: Among large language models, open-source Llama is the most popular, followed by Gemma 2. Commercial models are dominated by GPT-4. Approaches based on base language models primarily use RoBERTa-based models. Based on the results submitted by several teams, large language models can achieve good performance with fewer improvements, suggesting a higher performance floor. However, due to limited improvement methods for large language models, combining large language models with base language models offers more room for innovation. Ultimately, regardless of the approach, achieving good results requires novel and well-designed improvements.

Large Model Improvement Strategies:

In-context learning
Finetuning

4. Core Challenges and Future Outlook

Core Challenges:

Data Scarcity: Difficulty in obtaining high-quality, large-scale annotated data.
Class Imbalance: Uneven distribution of different risk levels of suicidal behavior in reality.
Subtlety and Context Dependence of Text: Suicidal intent may be expressed in subtle or indirect ways.
Cross-cultural and Linguistic Differences: Expression of suicidal thoughts may vary across cultures and languages.

Future Competition Design:

Multimodal Datasets: Combining text, images, user behavior data, etc.
Cross-lingual Challenges: Expanding suicide risk detection to multiple languages.
Temporal Analysis: Providing historical user data to predict the evolution of suicide risk.
Interpretability Requirements: Emphasizing the explainability and transparency of model decisions.
Evaluation Mechanisms: Designing evaluation mechanisms that simulate real-time social media environments.
Intervention Strategies: Focusing on designing effective intervention strategies in addition to detection.

Conclusion

The IEEE BigData 2024@suicide detection on social media competition attracted research teams from around the world, showcasing the potential of natural language processing and machine learning in addressing critical social issues. Participating teams tackled challenges such as data scarcity and class imbalance through innovative methods, exploring the performance of both base language models and large language models in suicide risk detection tasks. Future competitions will continue to drive development in this field, focusing on innovations in multimodal, cross-lingual, and real-time intervention aspects.