Conversation Kernels: A Flexible Mechanism to Learn Relevant Context for Online Conversation Understanding


Abstract

Understanding online conversations has attracted research attention with the growth of social networks and online discussion forums. Content analysis of posts and replies in online conversations is difficult because each individual utterance is usually short and may implicitly refer to other posts within the same conversation. Thus, understanding individual posts requires capturing the conversational context and dependencies between different parts of a conversation tree and then encoding the context dependencies between posts and comments/replies into the language model.
To this end, we propose a general-purpose mechanism to discover appropriate conversational context for various aspects about an online post in a conversation, such as whether it is informative, insightful, interesting or funny. Specifically, we design two families of Conversation Kernels, which explore different parts of the neighborhood of a post in the tree representing the conversation and through this, build relevant conversational context that is appropriate for each task being considered. We apply our developed method to conversations crawled from slashdot.org, which allows users to apply highly different labels to posts, such as ‘insightful’, ‘funny’, etc., and therefore provides an ideal experimental platform to study whether a framework such as Conversation Kernels is general-purpose and flexible enough to be adapted to disparately different conversation understanding tasks.
We perform extensive experiments and find that context-augmented conversation kernels can significantly outperform transformer-based baselines, with absolute improvements in accuracy up to 20% and up to 19% for macro-F1 score. Our evaluations also show that conversation kernels outperform state-of-the-art large language models including GPT-4. We also showcase the generalizability and demonstrate that conversation kernels can be a general-purpose approach that flexibly handles distinctly different conversation understanding tasks in a unified manner.

Dataset and Codes

An anonymized version of the dataset and codes used in our paper are available for the research community.

  1. Slashdot Dataset: This dataset contains conversations crawled from Slashdot.org.

  2. Codes: Implementation details and codes for Conversation Kernels are available on GITHUB.

You can find the format of the Slashdot dataset here.


Contact Us


If you are interested in using this data, please fill the form to . Request specific data to get the link where you can download the data.

We are sharing the dataset under the terms and conditions specified here below. Please note that submitting the form indicates that you accept the terms and conditions of the data. In the form, please indicate which part of the dataset you need. If you do not get any email notification for your logged request within 24 hours, please e-mail us at netsys.noreply[at]gmail.com.

Dataset Terms and Conditions

  1. You will use the data solely for the purpose of non-profit research or non-profit education.

  2. You will respect the privacy of end users and organizations that may be identified in the data. You will not attempt to reverse engineer, decrypt, de-anonymize, derive or otherwise re-identify anonymized information.

  3. You will not distribute the data beyond your immediate research group.

  4. If you create a publication using our datasets, please cite our papers as follows.

            @article{agarwal2025conversation,
              title={Conversation Kernels: A Flexible Mechanism to Learn Relevant Context for Online Conversation Understanding},
              author={Agarwal, Vibhor and Gupta, Arjoo and De, Suparna and Sastry, Nishanth},
              journal={arXiv preprint arXiv:2505.20482},
              year={2025}
            }