南開科技大學圖書館 |

Language: 繁體中文

說明(常見問題)

南開科技大學

圖書館首頁

編目中圖書申請

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Scalable clustering algorithms.

Banerjee, Arindam.

Scalable clustering algorithms.

紀錄類型:	書目-電子資源 : 單行本
正題名/作者:	Scalable clustering algorithms./
作者:	Banerjee, Arindam.
面頁冊數:	246 p.
附註:	Source: Dissertation Abstracts International, Volume: 66-08, Section: B, page: 4384.
Contained By:	Dissertation Abstracts International66-08B.
標題:	Engineering, Electronics and Electrical. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3187659
ISBN:	9780542291036

Scalable clustering algorithms.
Banerjee, Arindam.

Scalable clustering algorithms. - 246 p.

Source: Dissertation Abstracts International, Volume: 66-08, Section: B, page: 4384.

Thesis (Ph.D.)--The University of Texas at Austin, 2005.

Scalable clustering algorithms that can work with a wide variety of distance measures and also incorporate application specific requirements are critically important for modern day data analysis and predictive modeling. In this thesis, we propose and analyze a large class of such algorithms, evaluate their performance on benchmark datasets and investigate theoretical connections of the proposed algorithms to lossy compression and stochastic prediction.

ISBN: 9780542291036Subjects--Topical Terms:

170927
Engineering, Electronics and Electrical.

Scalable clustering algorithms.
LDR:03488nmm 2200313 4500 001 1000004735
005 20061114130250.5
008 061114s2005 eng d
020 $a 9780542291036
035 $a (UnM)AAI3187659
035 $a AAI3187659
040 $a UnM $c UnM{me_controlnum}
100 1 $a Banerjee, Arindam. $3 1000005809
245 1 0 $a Scalable clustering algorithms.
300 $a 246 p.
500 $a Source: Dissertation Abstracts International, Volume: 66-08, Section: B, page: 4384.
500 $a Supervisor: Joydeep Ghosh.
502 $a Thesis (Ph.D.)--The University of Texas at Austin, 2005.
520 $a Scalable clustering algorithms that can work with a wide variety of distance measures and also incorporate application specific requirements are critically important for modern day data analysis and predictive modeling. In this thesis, we propose and analyze a large class of such algorithms, evaluate their performance on benchmark datasets and investigate theoretical connections of the proposed algorithms to lossy compression and stochastic prediction.
520 $a First, a wide variety of popular centroid based clustering algorithms are unified using a large class of distance measures known as Bregman divergences. We present both hard and soft-clustering algorithms using Bregman divergences. By establishing a bijection between regular exponential family distributions and regular Bregman divergences, we note that Bregman soft clustering algorithms are equivalent to learning mixtures of exponential family distributions, but can be computationally more efficient in practice. We also design algorithms for clustering directional data that generate balanced clusters, i.e., clusters of comparable sizes, a desirable property in certain practical applications. Experimental results show that such algorithms perform well for high-dimensional problems such as text clustering.
520 $a A general framework for scaling up balanced clustering algorithms is then proposed. The framework is applicable to all the algorithms presented in this thesis as well as a wide variety of other algorithms. Extensive experimental results on benchmark datasets are provided to establish the efficacy of the proposed framework. Further, we propose a new method for evaluation and model selection for clustering that can be applied to practically any clustering algorithm. The method is applicable in a transductive setting and measures the predictive accuracy of a clustering algorithm.
520 $a A detailed analysis of the connections of rate distortion theory to the proposed clustering algorithms; in particular the Bregman clustering algorithms, is also presented. In the process, we establish some key theoretical results in rate distortion theory for Bregman divergences, special cases of which has been studied in the literature using squared Euclidean distance. Also, we generalize a widely known result in stochastic prediction by establishing that the conditional expectation is the optimal predictor of a random variable if and only if the prediction error is measured by a Bregman divergence. This results explains the fundamental reason behind the efficiency of the Bregman clustering algorithms.
590 $a School code: 0227.
650 4 $a Engineering, Electronics and Electrical. $3 170927
650 4 $a Computer Science. $3 1000005419
690 $a 0544
690 $a 0984
710 2 0 $a The University of Texas at Austin. $3 1000005508
773 0 $t Dissertation Abstracts International $g 66-08B.
790 1 0 $a Ghosh, Joydeep, $e advisor
790 $a 0227
791 $a Ph.D.
792 $a 2005
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3187659 $z