IEEE BigData 2024 Cup: Detection of suicide risk on social media
The IEEE BigData 2024 Cup Challenges, centered on suicide ideation detection, is an important activity that has been held annually since 2013 under the auspices of the IEEE International Conference on Big Data (https://bigdataieee.org/BigData2024/index.html). This competition will span several months, culminating with the announcement of the winners at the IEEE BigData 2024 conference, scheduled for December 15-18, 2024, in Washington DC, USA. Commemorating this milestone anniversary, the challenge is centered around social media activity; specifically, participants are tasked with detecting the level of suicide risk associated with posts made by users on these platforms. This endeavor highlights the growing significance of big data in mental health and the crucial role of predictive analytics in proactive intervention strategies.
The topic of this year's data science competition is suicide risk detection from Reddit posts. The dataset contains 2,000 Reddit posts, with 500 posts labeled with four suicide risk levels (indicator, ideation, behavior, attempt) and 1,500 unlabeled posts. Each post is represented by its text content, and the task for the competition participants is to develop a predictive model that can accurately classify the posts into the four suicide risk levels based on the posts. Such a model could play a crucial role in identifying individuals at risk of suicide and providing timely intervention and support.
Authors of selected challenge reports will be invited to extend their work for publication in the conference proceedings (after reviews by the Organizing Committee) and presentation at the conference. The invited teams will be chosen based on their final rank, the innovativeness of their approach, and the quality of the submitted report.
The task in this challenge is to design an accurate method for classifying Reddit posts into four suicide risk levels: indicator, ideation, behavior, and attempt. The available training data contains 500 instances (Reddit posts) with labeled suicide risk levels and 1500 unlabeled Reddit posts. Each instance represents a Reddit post, and the text content of the post is provided. The test data containing 100 instances is also provided in the same format.
Leaderboard: The quality of predictions will be evaluated using the weighted F1 score, which considers both precision and recall while accounting for the class imbalance. Preliminary results will be published or updated on the public leaderboard each morning.
Please submit your predictions for the test set as a .xlsx file containing exactly 100 rows via prediction file submission section, each with index, a predictive label and its associated probability distribution (e.g., [0.29, 0.62, 0.04, 0.05]). The submission should include three columns per row: 1. index, 2. The predicted suicide risk level label (chosen from {indicator, ideation, behavior, attempt}), and 3. The probability distribution corresponding to this prediction. Ensure that the order of your predictions matches the order of the test set instances. Here is an example:
index
suicide risk
probability distribution
0
ideation
[0.29, 0.62, 0.04, 0.05]
Final evaluation: The final evaluation will be conducted after the competition ends, at a time to be announced later, using a new set of 100 posts. The evaluation will still utilize the weighted F1 score. Only teams that submit their source code and a report describing their approach by the deadline will qualify for the final evaluation. Reports and source code will be submitted via Google Drive or Baidu Drive. We will notify you in advance by email about the submission instructions.
Please submit the prediction file created by your team. Multiple submissions are permitted. The file format should be .xlsx, and the file name must be: YourTeamName.xlsx. The scores of the uploaded prediction results will be updated on the leaderboard the following day. For a detailed explanation of the content in your prediction file, please refer to the 'Task Description' section.
Based on the submitted works, we have evaluated the teams according to the following selection criteria:
Model performance (30%)
Approach innovation (40%)
Report quality (30%)
Following this evaluation, we are pleased to invite the top 10 teams (we have extended 2 additional slots) to submit papers for the conference. These teams are:
The Dual(75.436), BioNLP@WCM(74.389), Detection of Suicide(71.315), mukumuku(71.015), Calculators(70.523), EEEAT(67.967), BNU AI and Mental Health(66.824), LifeWatcher(65.584), kubapok(64.653), MIDAS(62.449).
Paper submission system: https://wi-lab.com/cyberchair/2024/bigdata24/index.php (The chairs of IEEE BigData conference will add the link of big data cup paper submission later in the paper submission system)
Paper format: It is the same as the main conference paper, 10 pages double columns IEEE format.
Rank
Team Name
Final Score(Model performance in the final evaluation)
1
Detection of Suicide
0.7605
2
kubapok
0.7551
3
mukumuku
0.7505
4
BioNLP@WCM
0.7463
5
Calculators
0.7341
6
The Dual
0.7312
7
BNU AI and Mental Health
0.7108
8
MindFlow
0.7072
9
EEEAT
0.6989
10
MIDAS
0.6983
11
PotatoTomato
0.6915
12
LifeWatcher
0.5528
13
Data Science and Decision Making Lab BGU
0.5496
Cash prizes will be awarded to top-3 teams. We will contact the winning teams later.
Start of the competition, the task is revealed, May 1, 2024
Deadline for contest teams to submit email of intent, June 10, 2024
Deadline for submitting the source code & the detailed report of the solutions, End of the competition, Aug 31, 2024
Announcement of winning teams, Sending invitations for submitting papers for the special track at the lEEE BigData 2024 conference, Sept 15, 2024
Deadline for submitting invited papers, Oct 10, 2024
Notifcation of paper acceptance, Oct 30, 2024
Camera-ready of accepted papers due, Nov 15, 2024
The lEEE BigData 2024 conference, Washington DC, USA, Dec 15-18, 2024
Attractive cash prizes will be awarded to the top-performing teams.
1000 USD for the winning solution
500 USD for the 2nd place solution
250 USD for the 3rd place solution
Once you have accepted the Data Usage Agreement, please send us your team's information in the following format via email. We will respond to your inquiry and provide you with the dataset:
Team name.
List of Team Members with Affiliations.
Contact Person and Email Address.
×
Data Usage Agreement
In consideration of the promises and mutual covenants contained in this Agreement, Recipient agree to the terms and conditions below.
Article 1. Data Set and Grant of License
1.1 The Dataset has been compiled by members of The Hong Kong Polytechnic University and comprises publicly available data from Reddit with the purpose of detecting users at suicide risk.
1.2 The Hong Kong Polytechnic University grants Recipient a non-exclusive license to use the Data Set solely for not-for-profit educational and/or research purposes. Uses of the Data Set include, but are not limited, to viewing parts or the whole of the Data Set; comparing data or content from the Data Set with data or content in other data sets; verifying research results with the Data Set; and extracting any part of the Data Set for use in Recipient publications or Recipient research in accordance with the terms of this Agreement.
Article 2. Recipient Representations
2.1 Recipient represents that it is not bound by any pre-existing legal obligations or other applicable law(s) that prevent Recipient from receiving or using the Data Set.
2.2 Recipient shall provide proper citation and acknowledgement to The Hong Kong Polytechnic University as the source of the Data Set in Recipient publications, presentations or other public dissemination of work utilizing the Data Set.
@article{li2022suicide,
title={Suicide risk level prediction and suicide trigger detection: A benchmark dataset},
author={Li, Jun and Chen, Xinhong and Lin, Zehang and Yang, Kaiqi and Leong, Hong Va and Yu, Nancy Xiaonan and Li, Qing},
journal={HKIE Transactions Hong Kong Institution of Engineers},
volume={29},
number={4},
pages={268--282},
year={2022},
publisher={Taylor \& Francis}
}
2.3 Recipient shall use Data Set for non-commercial, educational and/or research purposes only.
2.4 Recipient shall provide The Hong Kong Polytechnic University with immediate notice in writing of any breach of this Agreement, and if identification of any user in the Data Set becomes known to Recipient, Recipient shall also immediately use its reasonable best efforts to mitigate any harm or damage from such breach.
Article 3. Recipient Restrictions
3.1 Recipient shall not deduce or obtain information from the Data Set that results in Recipient or any third party(ies) directly or indirectly identifying any research subjects with or without the aid of other information acquired elsewhere.
3.2 Recipient shall not use the Data Set in any way prohibited by applicable local, state or federal laws.
3.3 Recipient shall not modify the Data Set, except as allowed hereunder.
3.4 Recipient shall not transfer any part of the Data Set to any third party without prior written consent from The Hong Kong Polytechnic University.
3.5 Recipient shall not make or use the Data Set for any commercial purpose.
Prof. Li Qing, The Hong Kong Polytechnic University
Dr. Hong Va Leong, The Hong Kong Polytechnic University
Li Jun, The Hong Kong Polytechnic University
Wang Xiangmeng, University of Technology Sydney
Yan Yifei, City University of Hong Kong
Zhang Ziyan, The Hong Kong Polytechnic University
For registration inquiries, contact Zhang Ziyan at ariana.zhang@connect.polyu.hk
For inquiries about competition rankings, contact Yan Yifei at yfyan8-c@my.cityu.edu.hk
For other inquiries, contact Alex at hialexlee@hotmail.com
Q: On the challenge website it says "Deadline for contest teams to submit letter of intent, June 10, 2024". I could, however, not find a draft/template for the letter of intent.
A: Participants do not need to submit a letter of intent. To register, simply send us an email by June 10, 2024. Once you receive an email with the dataset link, your registration is confirmed.
Q: I uploaded our team's predictions to the challenge website earlier today, but the leaderboard score hasn't been updated yet.
A: We typically update the leaderboard daily. If you have any concerns, please notify us by email, and we will update the score on the leaderboard immediately.
Q: May I know how many attempts each team has for predicting the test partition labels.
A: Teams are free to submit their prediction results as many times as needed until they achieve satisfactory performance.