Dataset Format ==================== ==================== Self-disclosure Dataset: This dataset contains synthetic dataset generated from LLMs and labelled with 19 self-disclosures types. 1. Name 2. Birthdate/DoB 3. Location 4. Country 5. Marital Status 6. Religion 7. Ethnicity/Race 8. Gender 9. Parenthood 10. Age 11. Sexuality 12. Medical Information 13. Employment 14. Relationship 15. Family 16. Gender-Age 17. Mental Health 18. 18. Physical Appearance 19. Degree/Designation The dataset contains a total of 2,888 posts, 954 generated from Llama 2, 900 generated from Llama3 and 1,034 generated from Zephyr. Every post has the following information: -id: ID of the post -text: Content of the post -label: The label field names the self-disclosures types (e.g., Sexuality, Location, Gender) assigned to each specified character span in the text. ==================== ====================