Domain Representative Keywords Selection: A Probabilistic Approach
- URL: http://arxiv.org/abs/2203.10365v1
- Date: Sat, 19 Mar 2022 18:04:12 GMT
- Title: Domain Representative Keywords Selection: A Probabilistic Approach
- Authors: Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang, Yunyao Li,
Lucian Popa, ChengXiang Zhai
- Abstract summary: We propose a probabilistic approach to select a subset of a textittarget domain representative keywords from a candidate set, contrasting with a context domain.
We introduce an textitoptimization algorithm for selecting the subset from the generated candidate distribution.
Experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.
- Score: 39.24258854355122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a probabilistic approach to select a subset of a \textit{target
domain representative keywords} from a candidate set, contrasting with a
context domain. Such a task is crucial for many downstream tasks in natural
language processing. To contrast the target domain and the context domain, we
adapt the \textit{two-component mixture model} concept to generate a
distribution of candidate keywords. It provides more importance to the
\textit{distinctive} keywords of the target domain than common keywords
contrasting with the context domain. To support the \textit{representativeness}
of the selected keywords towards the target domain, we introduce an
\textit{optimization algorithm} for selecting the subset from the generated
candidate distribution. We have shown that the optimization algorithm can be
efficiently implemented with a near-optimal approximation guarantee. Finally,
extensive experiments on multiple domains demonstrate the superiority of our
approach over other baselines for the tasks of keyword summary generation and
trending keywords selection.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - Keyword Targeting Optimization in Sponsored Search Advertising:
Combining Selection and Matching [0.0]
An optimal keyword targeting strategy guarantees reaching the right population effectively.
This paper aims to address the keyword targeting problem, which is a challenging task because of the incomplete information of historical advertising performance indices.
Experimental results show that, (a) BB-KSM outperforms seven baselines in terms of profit; (b) BB-KSM shows its superiority as the budget increases.
arXiv Detail & Related papers (2022-10-19T03:37:32Z) - Searching for Optimal Subword Tokenization in Cross-domain NER [19.921518007163]
In this work, we introduce a subword-level solution, X-Piece, for input word-level distribution shift in NER.
Specifically, we re-tokenize the input words of the source domain to approach the target subword distribution, which is formulated and solved as an optimal transport problem.
Experimental results show the effectiveness of the proposed method based on BERT-tagger on four benchmark NER datasets.
arXiv Detail & Related papers (2022-06-07T14:39:31Z) - CA-UDA: Class-Aware Unsupervised Domain Adaptation with Optimal
Assignment and Pseudo-Label Refinement [84.10513481953583]
unsupervised domain adaptation (UDA) focuses on the selection of good pseudo-labels as surrogates for the missing labels in the target data.
source domain bias that deteriorates the pseudo-labels can still exist since the shared network of the source and target domains are typically used for the pseudo-label selections.
We propose CA-UDA to improve the quality of the pseudo-labels and UDA results with optimal assignment, a pseudo-label refinement strategy and class-aware domain alignment.
arXiv Detail & Related papers (2022-05-26T18:45:04Z) - A Structured Span Selector [100.0808682810258]
We propose a novel grammar-based structured span selection model.
We evaluate our model on two popular span prediction tasks: coreference resolution and semantic role labeling.
arXiv Detail & Related papers (2022-05-08T23:58:40Z) - Using Optimal Transport as Alignment Objective for fine-tuning
Multilingual Contextualized Embeddings [7.026476782041066]
We propose using Optimal Transport (OT) as an alignment objective during fine-tuning to improve multilingual contextualized representations.
This approach does not require word-alignment pairs prior to fine-tuning and instead learns the word alignments within context in an unsupervised manner.
arXiv Detail & Related papers (2021-10-06T16:13:45Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Affinity Space Adaptation for Semantic Segmentation Across Domains [57.31113934195595]
In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation.
Motivated by the fact that source and target domain have invariant semantic structures, we propose to exploit such invariance across domains.
We develop two affinity space adaptation strategies: affinity space cleaning and adversarial affinity space alignment.
arXiv Detail & Related papers (2020-09-26T10:28:11Z) - Keywords lie far from the mean of all words in local vector space [5.040463208115642]
In this work, we follow a different path to detect the keywords from a text document by modeling the main distribution of the document's words using local word vector representations.
We confirm the high performance of our approach compared to strong baselines and state-of-the-art unsupervised keyword extraction methods.
arXiv Detail & Related papers (2020-08-21T14:42:33Z) - Keyword-Attentive Deep Semantic Matching [1.8416014644193064]
We propose a keyword-attentive approach to improve deep semantic matching.
We first leverage domain tags from a large corpus to generate a domain-enhanced keyword dictionary.
During model training, we propose a new negative sampling approach based on keyword coverage between the input pair.
arXiv Detail & Related papers (2020-03-11T10:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.