IEEE BigData 2024 Cup: Detection of suicide risk on social media

The IEEE BigData 2024 Cup Challenges, centered on suicide ideation detection, is an important activity that has been held annually since 2013 under the auspices of the IEEE International Conference on Big Data (https://bigdataieee.org/BigData2024/index.html). This competition will span several months, culminating with the announcement of the winners at the IEEE BigData 2024 conference, scheduled for December 15-18, 2024, in Washington DC, USA. Commemorating this milestone anniversary, the challenge is centered around social media activity; specifically, participants are tasked with detecting the level of suicide risk associated with posts made by users on these platforms. This endeavor highlights the growing significance of big data in mental health and the crucial role of predictive analytics in proactive intervention strategies.

The topic of this year's data science competition is suicide risk detection from Reddit posts. The dataset contains 2,000 Reddit posts, with 500 posts labeled with four suicide risk levels (indicator, ideation, behavior, attempt) and 1,500 unlabeled posts. Each post is represented by its text content, and the task for the competition participants is to develop a predictive model that can accurately classify the posts into the four suicide risk levels based on the posts. Such a model could play a crucial role in identifying individuals at risk of suicide and providing timely intervention and support.

Authors of selected challenge reports will be invited to extend their work for publication in the conference proceedings (after reviews by the Organizing Committee) and presentation at the conference. The invited teams will be chosen based on their final rank, the innovativeness of their approach, and the quality of the submitted report.

The task in this challenge is to design an accurate method for classifying Reddit posts into four suicide risk levels: indicator, ideation, behavior, and attempt. The available training data contains 500 instances (Reddit posts) with labeled suicide risk levels and 1500 unlabeled Reddit posts. Each instance represents a Reddit post, and the text content of the post is provided. The test data containing 100 instances is also provided in the same format.

Leaderboard: The quality of predictions will be evaluated using the weighted F1 score, which considers both precision and recall while accounting for the class imbalance. Preliminary results will be published or updated on the public leaderboard each morning.

Please submit your predictions for the test set as a .xlsx file containing exactly 100 rows via prediction file submission section, each with index, a predictive label and its associated probability distribution (e.g., [0.29, 0.62, 0.04, 0.05]). The submission should include three columns per row: 1. index, 2. The predicted suicide risk level label (chosen from {indicator, ideation, behavior, attempt}), and 3. The probability distribution corresponding to this prediction. Ensure that the order of your predictions matches the order of the test set instances. Here is an example:

index suicide risk probability distribution
0 ideation [0.29, 0.62, 0.04, 0.05]

Final evaluation: The final evaluation will be conducted after the competition ends, at a time to be announced later, using a new set of 100 posts. The evaluation will still utilize the weighted F1 score. Only teams that submit their source code and a report describing their approach by the deadline will qualify for the final evaluation. Reports and source code will be submitted via Google Drive or Baidu Drive. We will notify you in advance by email about the submission instructions.

Please submit the prediction file created by your team. Multiple submissions are permitted. The file format should be .xlsx, and the file name must be: YourTeamName.xlsx. The scores of the uploaded prediction results will be updated on the leaderboard the following day. For a detailed explanation of the content in your prediction file, please refer to the 'Task Description' section.

Based on the submitted works, we have evaluated the teams according to the following selection criteria:

Following this evaluation, we are pleased to invite the top 10 teams (we have extended 2 additional slots) to submit papers for the conference. These teams are: The Dual(75.436), BioNLP@WCM(74.389), Detection of Suicide(71.315), mukumuku(71.015), Calculators(70.523), EEEAT(67.967), BNU AI and Mental Health(66.824), LifeWatcher(65.584), kubapok(64.653), MIDAS(62.449).

Rank Team Name Final Score(Model performance in the final evaluation)
1 Detection of Suicide 0.7605
2 kubapok 0.7551
3 mukumuku 0.7505
4 BioNLP@WCM 0.7463
5 Calculators 0.7341
6 The Dual 0.7312
7 BNU AI and Mental Health 0.7108
8 MindFlow 0.7072
9 EEEAT 0.6989
10 MIDAS 0.6983
11 PotatoTomato 0.6915
12 LifeWatcher 0.5528
13 Data Science and Decision Making Lab BGU 0.5496

Cash prizes will be awarded to top-3 teams. We will contact the winning teams later.

Attractive cash prizes will be awarded to the top-performing teams.

Once you have accepted the Data Usage Agreement, please send us your team's information in the following format via email. We will respond to your inquiry and provide you with the dataset:

For registration inquiries, contact Zhang Ziyan at ariana.zhang@connect.polyu.hk

For inquiries about competition rankings, contact Yan Yifei at yfyan8-c@my.cityu.edu.hk

For other inquiries, contact Alex at hialexlee@hotmail.com

Q: On the challenge website it says "Deadline for contest teams to submit letter of intent, June 10, 2024". I could, however, not find a draft/template for the letter of intent.
Q: I uploaded our team's predictions to the challenge website earlier today, but the leaderboard score hasn't been updated yet.
Q: May I know how many attempts each team has for predicting the test partition labels.