[('how do bi-encoders work for sentence embeddings',
'SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features: Models based on large-pretrained language models, such as S(entence)BERT,\nprovide effective and efficient sentence embeddings that show high correlation\nto human similarity ratings, but lack interpretability. On the other hand,\ngraph metrics for graph-based meaning representations (e.g., Abstract Meaning\nRepresentation, AMR) can make explicit the semantic aspects in which two\nsentences are similar. However, such metrics tend to be slow, rely on parsers,\nand do not reach state-of-the-art performance when rating sentence similarity.\n In this work, we aim at the best of both worlds, by learning to induce\n$S$emantically $S$tructured $S$entence BERT embeddings (S$^3$BERT). Our\nS$^3$BERT embeddings are composed of explainable sub-embeddings that emphasize\nvarious semantic sentence features (e.g., semantic roles, negation, or\nquantification). We show how to i) learn a decomposition of the sentence\nembeddings into semantic features, through approximation of a suite of\ninterpretable AMR graph metrics, and how to ii) preserve the overall power of\nthe neural embeddings by controlling the decomposition learning process with a\nsecond objective that enforces consistency with the similarity ratings of an\nSBERT teacher model. In our experimental studies, we show that our approach\noffers interpretability -- while fully preserving the effectiveness and\nefficiency of the neural sentence embeddings.',
'Yes',
-0.05326408),
('how do bi-encoders work for sentence embeddings',
'Are Classes Clusters?: Sentence embedding models aim to provide general purpose embeddings for\nsentences. Most of the models studied in this paper claim to perform well on\nSTS tasks - but they do not report on their suitability for clustering. This\npaper looks at four recent sentence embedding models (Universal Sentence\nEncoder (Cer et al., 2018), Sentence-BERT (Reimers and Gurevych, 2019), LASER\n(Artetxe and Schwenk, 2019), and DeCLUTR (Giorgi et al., 2020)). It gives a\nbrief overview of the ideas behind their implementations. It then investigates\nhow well topic classes in two text classification datasets (Amazon Reviews (Ni\net al., 2019) and News Category Dataset (Misra, 2018)) map to clusters in their\ncorresponding sentence embedding space. While the performance of the resulting\nclassification model is far from perfect, it is better than random. This is\ninteresting because the classification model has been constructed in an\nunsupervised way. The topic classes in these real life topic classification\ndatasets can be partly reconstructed by clustering the corresponding sentence\nembeddings.',
'No',
-0.009535169),
('how do bi-encoders work for sentence embeddings',
"Semantic Composition in Visually Grounded Language Models: What is sentence meaning and its ideal representation? Much of the expressive\npower of human language derives from semantic composition, the mind's ability\nto represent meaning hierarchically & relationally over constituents. At the\nsame time, much sentential meaning is outside the text and requires grounding\nin sensory, motor, and experiential modalities to be adequately learned.\nAlthough large language models display considerable compositional ability,\nrecent work shows that visually-grounded language models drastically fail to\nrepresent compositional structure. In this thesis, we explore whether & how\nmodels compose visually grounded semantics, and how we might improve their\nability to do so.\n Specifically, we introduce 1) WinogroundVQA, a new compositional visual\nquestion answering benchmark, 2) Syntactic Neural Module Distillation, a\nmeasure of compositional ability in sentence embedding models, 3) Causal\nTracing for Image Captioning Models to locate neural representations vital for\nvision-language composition, 4) Syntactic MeanPool to inject a compositional\ninductive bias into sentence embeddings, and 5) Cross-modal Attention\nCongruence Regularization, a self-supervised objective function for\nvision-language relation alignment. We close by discussing connections of our\nwork to neuroscience, psycholinguistics, formal semantics, and philosophy.",
'No',
-0.008887106),
('how do bi-encoders work for sentence embeddings',
"Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions: Text embedding models from Natural Language Processing can map text data\n(e.g. words, sentences, documents) to supposedly meaningful numerical\nrepresentations (a.k.a. text embeddings). While such models are increasingly\napplied in social science research, one important issue is often not addressed:\nthe extent to which these embeddings are valid representations of constructs\nrelevant for social science research. We therefore propose the use of the\nclassic construct validity framework to evaluate the validity of text\nembeddings. We show how this framework can be adapted to the opaque and\nhigh-dimensional nature of text embeddings, with application to survey\nquestions. We include several popular text embedding methods (e.g. fastText,\nGloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct\nvalidity analyses. We find evidence of convergent and discriminant validity in\nsome cases. We also show that embeddings can be used to predict respondent's\nanswers to completely new survey questions. Furthermore, BERT-based embedding\ntechniques and the Universal Sentence Encoder provide more valid\nrepresentations of survey questions than do others. Our results thus highlight\nthe necessity to examine the construct validity of text embeddings before\ndeploying them in social science research.",
'No',
-0.008583762),
('how do bi-encoders work for sentence embeddings',
'Learning Probabilistic Sentence Representations from Paraphrases: Probabilistic word embeddings have shown effectiveness in capturing notions\nof generality and entailment, but there is very little work on doing the\nanalogous type of investigation for sentences. In this paper we define\nprobabilistic models that produce distributions for sentences. Our\nbest-performing model treats each word as a linear transformation operator\napplied to a multivariate Gaussian distribution. We train our models on\nparaphrases and demonstrate that they naturally capture sentence specificity.\nWhile our proposed model achieves the best performance overall, we also show\nthat specificity is represented by simpler architectures via the norm of the\nsentence vectors. Qualitative analysis shows that our probabilistic model\ncaptures sentential entailment and provides ways to analyze the specificity and\npreciseness of individual words.',
'No',
-0.011975748),
('how do bi-encoders work for sentence embeddings',
"Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings: Semantic sentence embeddings are usually supervisedly built minimizing\ndistances between pairs of embeddings of sentences labelled as semantically\nsimilar by annotators. Since big labelled datasets are rare, in particular for\nnon-English languages, and expensive, recent studies focus on unsupervised\napproaches that require not-paired input sentences. We instead propose a\nlanguage-independent approach to build large datasets of pairs of informal\ntexts weakly similar, without manual human effort, exploiting Twitter's\nintrinsic powerful signals of relatedness: replies and quotes of tweets. We use\nthe collected pairs to train a Transformer model with triplet-like structures,\nand we test the generated embeddings on Twitter NLP similarity tasks (PIT and\nTURL) and STSb. We also introduce four new sentence ranking evaluation\nbenchmarks of informal texts, carefully extracted from the initial collections\nof tweets, proving not only that our best model learns classical Semantic\nTextual Similarity, but also excels on tasks where pairs of sentences are not\nexact paraphrases. Ablation studies reveal how increasing the corpus size\ninfluences positively the results, even at 2M samples, suggesting that bigger\ncollections of Tweets still do not contain redundant information about semantic\nsimilarities.",
'No',
-0.01219046),
('how do bi-encoders work for sentence embeddings',
"How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation: Sentence encoders map sentences to real valued vectors for use in downstream\napplications. To peek into these representations - e.g., to increase\ninterpretability of their results - probing tasks have been designed which\nquery them for linguistic knowledge. However, designing probing tasks for\nlesser-resourced languages is tricky, because these often lack large-scale\nannotated data or (high-quality) dependency parsers as a prerequisite of\nprobing task design in English. To investigate how to probe sentence embeddings\nin such cases, we investigate sensitivity of probing task results to structural\ndesign choices, conducting the first such large scale study. We show that\ndesign choices like size of the annotated probing dataset and type of\nclassifier used for evaluation do (sometimes substantially) influence probing\noutcomes. We then probe embeddings in a multilingual setup with design choices\nthat lie in a 'stable region', as we identify for English, and find that\nresults on English do not transfer to other languages. Fairer and more\ncomprehensive sentence-level probing evaluation should thus be carried out on\nmultiple languages in the future.",
'No',
-0.015550519),
('how do bi-encoders work for sentence embeddings',
'Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences: Sentence embedding methods offer a powerful approach for working with short\ntextual constructs or sequences of words. By representing sentences as dense\nnumerical vectors, many natural language processing (NLP) applications have\nimproved their performance. However, relatively little is understood about the\nlatent structure of sentence embeddings. Specifically, research has not\naddressed whether the length and structure of sentences impact the sentence\nembedding space and topology. This paper reports research on a set of\ncomprehensive clustering and network analyses targeting sentence and\nsub-sentence embedding spaces. Results show that one method generates the most\nclusterable embeddings. In general, the embeddings of span sub-sentences have\nbetter clustering properties than the original sentences. The results have\nimplications for future sentence embedding models and applications.',
'No',
-0.012663184),
('how do bi-encoders work for sentence embeddings',
'Vec2Sent: Probing Sentence Embeddings with Natural Language Generation: We introspect black-box sentence embeddings by conditionally generating from\nthem with the objective to retrieve the underlying discrete sentence. We\nperceive of this as a new unsupervised probing task and show that it correlates\nwell with downstream task performance. We also illustrate how the language\ngenerated from different encoders differs. We apply our approach to generate\nsentence analogies from sentence embeddings.',
'Yes',
-0.004863006),
('how do bi-encoders work for sentence embeddings',
'Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings: Semantic representation learning for sentences is an important and\nwell-studied problem in NLP. The current trend for this task involves training\na Transformer-based sentence encoder through a contrastive objective with text,\ni.e., clustering sentences with semantically similar meanings and scattering\nothers. In this work, we find the performance of Transformer models as sentence\nencoders can be improved by training with multi-modal multi-task losses, using\nunpaired examples from another modality (e.g., sentences and unrelated\nimage/audio data). In particular, besides learning by the contrastive loss on\ntext, our model clusters examples from a non-linguistic domain (e.g.,\nvisual/audio) with a similar contrastive loss at the same time. The reliance of\nour framework on unpaired non-linguistic data makes it language-agnostic,\nenabling it to be widely applicable beyond English NLP. Experiments on 7\nsemantic textual similarity benchmarks reveal that models trained with the\nadditional non-linguistic (/images/audio) contrastive objective lead to higher\nquality sentence embeddings. This indicates that Transformer models are able to\ngeneralize better by doing a similar task (i.e., clustering) with unpaired\nexamples from different modalities in a multi-task fashion.',
'No',
-0.013869206)]