Prakhar Gupta

prakhar.gupta@epfl.ch +41 21 693 13 99
Nationalité : Indian
Google Scholar Link
https://scholar.google.ch/citations?user=wRJyHpIAAAAJ&hl=en&oi=ao
EPFL IC IINFCOM MLO
INJ 338 (Bâtiment INJ)
Station 14
CH-1015 Lausanne
+41 21 693 13 99
EPFL > IC > IINFCOM > MLO
Web site: Site web: https://go.epfl.ch/edic_program
Domaines de compétences
Formation
B.Tech.
Aerospace Engineering
Indian Institute of Technology Kanpur
2010-14
M.Tech.
Computer Science and Engineering
Indian Institute of Technology Bombay
2014-16
Publications
Publications Infoscience
Unsupervised learning of sentence embeddings
Learning Word Vectors for 157 Languages
Learning Word Vectors for 157 Languages
Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.
- Fulltext: 1802.06893 - PDF;
- Export as: BibTeX | MARC | MARCXML | DC | EndNote | NLM | RefWorks | RIS
- View as: MARC | MARCXML | DC
- Add to your basket: