登入帳戶  | 訂單查詢  | 購物車/收銀台( 0 ) | 在線留言板  | 付款方式  | 聯絡我們  | 運費計算  | 幫助中心 |  加入書簽
會員登入 新註冊 | 新用戶登記
HOME新書上架暢銷書架好書推介特價區會員書架精選月讀2021年度TOP分類閱讀雜誌音碟 香港/國際用戶
最新/最熱/最齊全的簡體書網 品種:超過100萬種書,正品正价,放心網購,悭钱省心 送貨:速遞 / EMS,時效:出貨後2-3日

2022年06月出版新書

2022年05月出版新書

2022年04月出版新書

2022年03月出版新書

2022年02月出版新書

2022年01月出版新書

2021年12月出版新書

2021年11月出版新書

2021年10月出版新書

2021年09月出版新書

2021年08月出版新書

2021年07月出版新書

2021年06月出版新書

2021年05月出版新書

『簡體書』文本数据挖掘(英文版)

書城自編碼: 3677502
分類: 簡體書→大陸圖書→計算機/網絡數據庫
作 者: 宗成庆、夏睿、张家俊
國際書號(ISBN): 9787302590293
出版社: 清华大学出版社
出版日期: 2021-10-01

頁數/字數: /
書度/開本: 16开 釘裝: 平装

售價:NT$ 655

我要買

share:

** 我創建的書架 **
未登入.



新書推薦:
一叶茶千夜话
《 一叶茶千夜话 》

售價:NT$ 568.0
旅游绿皮书:2021-2022年中国旅游发展分析与预测
《 旅游绿皮书:2021-2022年中国旅游发展分析与预测 》

售價:NT$ 916.0
甲骨文丛书·圣山来客:追寻拜占庭的余辉
《 甲骨文丛书·圣山来客:追寻拜占庭的余辉 》

售價:NT$ 626.0
中国兵器史
《 中国兵器史 》

售價:NT$ 289.0
大学问·结社的艺术:16—18世纪东亚世界的文人社集(著名学者王汎森作序,李孝悌、吕妙芬、巫仁恕、左东岭一致推荐)
《 大学问·结社的艺术:16—18世纪东亚世界的文人社集(著名学者王汎森作序,李孝悌、吕妙芬、巫仁恕、左东岭一致推荐) 》

售價:NT$ 684.0
启微·自主:中国革命中的婚姻、法律与女性身份:1940~1960
《 启微·自主:中国革命中的婚姻、法律与女性身份:1940~1960 》

售價:NT$ 516.0
公园城市:城市公园景观设计与改造
《 公园城市:城市公园景观设计与改造 》

售價:NT$ 742.0
写给女人一生幸福的忠告(女性幸福力精进指南,提升情商、财商,平衡家庭、事业、个人成长)
《 写给女人一生幸福的忠告(女性幸福力精进指南,提升情商、财商,平衡家庭、事业、个人成长) 》

售價:NT$ 231.0

建議一齊購買:

+

NT$ 488
《 ANSYS AQWA软件入门与提高(万水ANSYS技术丛书) 》
+

NT$ 522
《 Oracle 12c从入门到精通(第二版) 》
+

NT$ 735
《 MySQL技术内幕:InnoDB存储引擎(第2版)(畅销书全新升级,第1版广获好评,资深专家撰写,国内外数据库专家联袂推荐,基于MySQL 5.6,从存储引擎内核角度对InnoDB的核心实现和工作机制进行深入剖析) 》
+

NT$ 359
《 大数据分析:Python爬虫、数据清洗和数据可视化 》
+

NT$ 579
《 Oracle程序员面试笔试宝典 》
+

NT$ 354
《 MySQL程序员面试笔试宝典 》
編輯推薦:
《文本数据挖掘(英文版)》面向文本挖掘任务的实际需求,通过实例从原理上对相关技术的理论方法和实现算法进行阐述,写作风格力求言简意赅,深入浅出,而不过多地涉及实现细节,尽量使读者能够在充分理解基本原理的基础上掌握应用系统的实现方法。
內容簡介:
《Text data mining》 offers thorough and detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation, feature selection, to text classification and text clustering. Also it presents predominant applications of text data mining, for example, topic model, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and text automatic summarization, etc.
關於作者:
Chengqing Zong is professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences. He serves as chairs for many prestigious conferences such as ACL-IJCNLP, IJCAI, IJCAI-ECAI, AAAI and COLING, etc., and served as associate editors for prestigious journals such as TALLIP, Machine Translation, etc. He is the President of Asian Federation on Natural Language Processing and a member of International Committee on Computational Linguistics.
目錄
1 Introduction 1
1.1 The Basic Concepts 1
1.2 Main Tasks of Text Data Mining 3
1.3 Existing Challenges in Text Data Mining 6
1.4 Overview and Organization of This Book 9
1.5 Further Reading 12
2 Data Annotation and Preprocessing 15
2.1 Data Acquisition 15
2.2 Data Preprocessing 20
2.3 Data Annotation 22
2.4 Basic Tools of NLP 25
2.4.1 Tokenization and POS Tagging 25
2.4.2 Syntactic Parser 27
2.4.3 N-gram Language Model 29
2.5 Further Reading 30
3 Text Representation 33
3.1 Vector Space Model 33
3.1.1 Basic Concepts 33
3.1.2 Vector Space Construction 34
3.1.3 Text Length Normalization 36
3.1.4 Feature Engineering 37
3.1.5 Other Text Representation Methods 39
3.2 Distributed Representation of Words 40
3.2.1 Neural Network Language Model 41
3.2.2 C&W Model 45
3.2.3 CBOW and Skip-Gram Model 47
3.2.4 Noise Contrastive Estimation and Negative Sampling 49
3.2.5 Distributed Representation Based on the Hybrid
Character-Word Method 51
3.3 Distributed Representation of Phrases 53
3.3.1 Distributed Representation Based on the
Bag-of-Words Model 54
3.3.2 Distributed Representation Based on Autoencoder 54
3.4 Distributed Representation of Sentences 58
3.4.1 General Sentence Representation 59
3.4.2 Task-Oriented Sentence Representation 63
3.5 Distributed Representation of Documents 66
3.5.1 General Distributed Representation of Documents 67
3.5.2 Task-Oriented Distributed Representation
of Documents 69
3.6 Further Reading 72
4 Text Representation with Pretraining and Fine-Tuning 75
4.1 ELMo: Embeddings from Language Models 75
4.1.1 Pretraining Bidirectional LSTM Language Models 76
4.1.2 Contextualized ELMo Embeddings for
Downstream Tasks 77
4.2 GPT: Generative Pretraining 78
4.2.1 Transformer 78
4.2.2 Pretraining the Transformer Decoder 80
4.2.3 Fine-Tuning the Transformer Decoder 81
4.3 BERT: Bidirectional Encoder Representations
from Transformer 82
4.3.1 BERT: Pretraining 83
4.3.2 BERT: Fine-Tuning 86
4.3.3 XLNet: Generalized Autoregressive Pretraining 86
4.3.4 UniLM 89
4.4 Further Reading 90
5 Text Classi?cation 93
5.1 The Traditional Framework of Text Classi?cation 93
5.2 Feature Selection 95
5.2.1 Mutual Information 96
5.2.2 Information Gain 99
5.2.3 The Chi-Squared Test Method 100
5.2.4 Other Methods 101
5.3 Traditional Machine Learning Algorithms for Text
Classi?cation 102
5.3.1 Na?ve Bayes 103
5.3.2 Logistic/Softmax and Maximum Entropy 105
5.3.3 Support Vector Machine 107
5.3.4 Ensemble Methods 110
5.4 Deep Learning Methods ............................................. 111
5.4.1 Multilayer Feed-Forward Neural Network ................ 111
5.4.2 Convolutional Neural Network ............................ 113
5.4.3 Recurrent Neural Network ................................. 115
5.5 Evaluation of Text Classi?cation 120
5.6 Further Reading 123
6 Text Clustering 125
6.1 Text Similarity Measures 125
6.1.1 The Similarity Between Documents 125
6.1.2 The Similarity Between Clusters 128
6.2 Text Clustering Algorithms 129
6.2.1 K-Means Clustering 129
6.2.2 Single-Pass Clustering 133
6.2.3 Hierarchical Clustering 136
6.2.4 Density-Based Clustering 138
6.3 Evaluation of Clustering 141
6.3.1 External Criteria 141
6.3.2 Internal Criteria 142
6.4 Further Reading 143
7 Topic Model 145
7.1 The History of Topic Modeling. 145
7.2 Latent Semantic Analysis 146
7.2.1 Singular Value Decomposition of the
Term-by-Document Matrix 147
7.2.2 Conceptual Representation and Similarity
Computation 148
7.3 Probabilistic Latent Semantic Analysis 150
7.3.1 Model Hypothesis .......................................... 150
7.3.2 Parameter Learning ......................................... 151
7.4 Latent Dirichlet Allocation .......................................... 153
7.4.1 Model Hypothesis .......................................... 153
7.4.2 Joint Probability ............................................ 155
7.4.3 Inference in LDA ........................................... 158
7.4.4 Inference for New Documents ............................. 160
7.5 Further Reading 161
8 Sentiment Analysis and Opinion Mining 163
8.1 History of Sentiment Analysis and Opinion Mining 163
8.2 Categorization of Sentiment Analysis Tasks 164
8.2.1 Categorization According to Task Output 164
8.2.2 According to Analysis Granularity 165
8.3 Methods for Document/Sentence-Level Sentiment Analysis 168
8.3.1 Lexicon- and Rule-Based Methods 169
8.3.2 Traditional Machine Learning Methods 170
8.3.3 Deep Learning Methods 174
8.4 Word-Level Sentiment Analysis and Sentiment Lexicon
Construction 178
8.4.1 Knowledgebase-Based Methods 178
8.4.2 Corpus-Based Methods 179
8.4.3 Evaluation of Sentiment Lexicons 182
8.5 Aspect-Level Sentiment Analysis 183
8.5.1 Aspect Term Extraction .................................... 183
8.5.2 Aspect-Level Sentiment Classi?cation .................... 186
8.5.3 Generative Modeling of Topics and Sentiments .......... 191
8.6 Special Issues in Sentiment Analysis................................ 193
8.6.1 Sentiment Polarity Shift .................................... 193
8.6.2 Domain Adaptation ......................................... 195
8.7 Further Reading ...................................................... 198
9 Topic Detection and Tracking ............................................. 201
9.1 History of Topic Detection and Tracking ........................... 201
9.2 Terminology and Task De?nition.................................... 202
9.2.1 Terminology ................................................ 202
9.2.2 Task ......................................................... 203
9.3 Story/Topic Representation and Similarity Computation .......... 206
9.4 Topic Detection....................................................... 209
9.4.1 Online Topic Detection ..................................... 209
9.4.2 Retrospective Topic Detection ............................. 211
9.5 Topic Tracking........................................................ 212
9.6 Evaluation ............................................................ 213
9.7 Social Media Topic Detection and Tracking ........................ 215
9.7.1 Social Media Topic Detection.............................. 216
9.7.2 Social Media Topic Tracking .............................. 217
9.8 Bursty Topic Detection............................................... 217
9.8.1 Burst State Detection ....................................... 218
9.8.2 Document-Pivot Methods .................................. 221
9.8.3 Feature-Pivot Methods ..................................... 222
9.9 Further Reading ...................................................... 224
10 Information Extraction 227
10.1 Concepts and History 227
10.2 Named Entity Recognition 229
10.2.1 Rule-based Named Entity Recognition 230
10.2.2 Supervised Named Entity Recognition Method 231
10.2.3 Semisupervised Named Entity Recognition Method 239
10.2.4 Evaluation of Named Entity Recognition Methods 241

10.3 Entity Disambiguation ............................................... 242
10.3.1 Clustering-Based Entity Disambiguation Method ........ 243
10.3.2 Linking-Based Entity Disambiguation .................... 248
10.3.3 Evaluation of Entity Disambiguation .. . . . ................. 254
10.4 Relation Extraction ................................................... 256
10.4.1 Relation Classi?cation Using Discrete Features .......... 258
10.4.2 Relation Classi?cation Using Distributed Features ....... 265
10.4.3 Relation Classi?cation Based on Distant Supervision .. . . 268
10.4.4 Evaluation of Relation Classi?cation . ..................... 269
10.5 Event Extraction 270
10.5.1 Event Description Template................................ 270
10.5.2 Event Extraction Method ................................... 272
10.5.3 Evaluation of Event Extraction ............................ 281
10.6 Further Reading ...................................................... 281
11 Automatic Text Summarization 285
11.1 Main Tasks in Text Summarization 285
11.2 Extraction-Based Summarization 287
11.2.1 Sentence Importance Estimation 287
11.2.2 Constraint-Based Summarization Algorithms 298
11.3 Compression-Based Automatic Summarization 299
11.3.1 Sentence Compression Method 300
11.3.2 Automatic Summarization Based on Sentence
Compression 305
11.4 Abstractive Automatic Summarization 307
11.4.1 Abstractive Summarization Based on
Information Fusion 307
11.4.2 Abstractive Summarization Based on the
Encoder-Decoder Framework .............................. 313
11.5 Query-Based Automatic Summarization ............................ 316
11.5.1 Relevance Calculation Based on the Language Model . . . 317
11.5.2 Relevance Calculation Based on Keyword Co-occurrence .............................................. 317
11.5.3 Graph-Based Relevance Calculation Method ............. 318
11.6 Crosslingual and Multilingual Automatic Summarization ......... 319
11.6.1 Crosslingual Automatic Summarization .. . ................ 319
11.6.2 Multilingual Automatic Summarization .. . . ............... 323
11.7 Summary Quality Evaluation and Evaluation Workshops.......... 325
11.7.1 Summary Quality Evaluation Methods .................... 325
11.7.2 Evaluation Workshops...................................... 330
11.8 Further Reading ...................................................... 332
References 335
內容試閱
Preface
With the rapid development and popularization of Internet and mobile communi- cation technologies, text data mining has attracted much attention. In particular, with the wide use of new technologies such as cloud computing, big data, and deep learning, text mining has begun playing an increasingly important role in many application ?elds, such as opinion mining and medical and ?nancial data analysis, showing broad application prospects.
Although I was supervising graduate students studying text classi?cation and automatic summarization more than ten years ago, I did not have a clear understand- ing of the overall concept of text data mining and only regarded the research topics as speci?c applications of natural language processing. Professor Jiawei Han’s book Data Mining: Concepts and Technology, published by Elsevier, Professor Bing Liu’s Web Data Mining, published by Springer, and other books have greatly bene?ted me. Every time I listen to their talks and discuss these topics with them face to face, I have bene?ted immensely. I was inspired to write this book for the course Text Data Mining, which I was invited to teach to graduates of the University of Chinese Academy of Sciences. At the end of 2015, I accepted the invitation and began to prepare the content design and selection of materials for the course. I had to study a large number of related papers, books, and other materials and began to seriously think of the rich connotation and extension of the term Text Data Mining. After more than a year’s study, I started to compile the courseware. With teaching practice, the outline of the concept has gradually formed.
 Rui Xia and Jiajun Zhang, two talented young people, helped me materialize my original writing plan. Rui Xia received his master’s degree in 2007 and was admitted to the Institute of Automation, Chinese Academy of Sciences, and studied for Ph.D. degree under my supervision. He was engaged in sentiment classi?cation and took it as the research topic of his Ph.D. dissertation. After he received his Ph.D. degree in 2011, his interests extended to opinion mining, text clustering and classi?cation, topic modeling, event detection and tracking, and other related topics. He has published a series of in?uential papers in the ?eld of sentiment analysis and opinion mining. He received the ACL 2019 outstanding paper award, and his paper on ensemble learning for sentiment classi?cation has been cited more than
600 times. Jiajun Zhang joined our institute after he graduated from university in 2006 and studied in my group in pursuit of his Ph.D. degree. He mainly engaged in machine translation research, but he performed well in many research topics, such as multilanguage automatic summarization, information extraction, and human– computer dialogue systems. Since 2016, he has been teaching some parts of the course on Natural Language Processing in cooperation with me, such as machine translation, automatic summarization, and text classi?cation, at the University of Chinese Academy of Sciences; this course is very popular with students. With the solid theoretical foundation of these two talents and their keen scienti?c insights, I am grati?ed that many cutting-edge technical methods and research results could be veri?ed and practiced and included in this book.
From early 2016 to June 2019, when the Chinese version of this book was published, it took more than three years. In these three years, most holidays, weekends, and other spare times of ours were devoted to the writing of this book. It was really suffering to endure the numerous modi?cations or even rewriting, but we were also very happy. We started to translate the Chinese version into English in the second half of 2019. Some more recent topics, including BERT (bidirectional encoder representations from transformers), have been added to the English version. As a cross domain of natural language processing and machine learning, text data mining faces the double challenges of the two domains and has broad application to the Internet and equipment for mobile communication. The topics and techniques presented in this book are all the technical foundations needed to develop such practical systems and have attracted much attention in recent years. It is hoped that this book will provide a comprehensive understanding for students, professors, and researchers in related areas. However, I must admit that due to the limitation of the authors’ ability and breadth of knowledge, as well as the lack of time and energy, there must be some omissions or mistakes in the book. We will be very grateful if
readers provide criticism, corrections, and any suggestions.
Beijing, China Chengqing Zong
20 May 2020

 

 

書城介紹  | 合作申請 | 索要書目  | 新手入門 | 聯絡方式  | 幫助中心 | 找書說明  | 送貨方式 | 付款方式 香港用户  | 台灣用户 | 海外用户
megBook.com.tw
Copyright (C) 2013 - 2022 (香港)大書城有限公司 All Rights Reserved.