Prototypes | |
LALInferenceKmeans * | LALInferenceIncrementalKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng) |
Kmeans cluster data, increasing k until the BIC stops increasing. More... | |
LALInferenceKmeans * | LALInferenceOptimizedKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng) |
Kmeans cluster data, maximizing BIC assuming BIC(k) is concave-down. More... | |
LALInferenceKmeans * | LALInferenceXmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng) |
Cluster data xmeans-style, splitting centroids according to the BIC. More... | |
LALInferenceKmeans * | LALInferenceRecursiveKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng) |
Run kmeans, recursively splitting centroids, accepting if the BIC increases. More... | |
void | LALInferenceKmeansRecursiveSplit (LALInferenceKmeans *kmeans, INT4 ntrials, gsl_rng *rng) |
Recursively split a k=1 kmeans. More... | |
void | LALInferenceKmeansRun (LALInferenceKmeans *kmeans) |
Run the kmeans algorithm until cluster assignments don't change. More... | |
LALInferenceKmeans * | LALInferenceKmeansRunBestOf (INT4 k, gsl_matrix *samples, INT4 ntrials, gsl_rng *rng) |
Run a kmeans several times and return the best. More... | |
LALInferenceKmeans * | LALInferenceCreateKmeans (INT4 k, gsl_matrix *data, gsl_rng *rng) |
Generate a new kmeans struct from a set of data. More... | |
void | LALInferenceKmeansSeededInitialize (LALInferenceKmeans *kmeans) |
A 'k-means++'-like seeded initialization of centroids for a kmeans run. More... | |
void | LALInferenceKmeansForgyInitialize (LALInferenceKmeans *kmeans) |
Randomly select points from the data as initial centroids for a kmeans run. More... | |
void | LALInferenceKmeansRandomPartitionInitialize (LALInferenceKmeans *kmeans) |
Use the resulting centroids from a random cluster assignment as the initial centroids of a kmeans run. More... | |
void | LALInferenceKmeansAssignment (LALInferenceKmeans *kmeans) |
The assignment step of the kmeans algorithm. More... | |
void | LALInferenceKmeansUpdate (LALInferenceKmeans *kmeans) |
The update step of the kmeans algorithm. More... | |
void | LALInferenceKmeansConstructMask (LALInferenceKmeans *kmeans, INT4 *mask, INT4 cluster_id) |
Construct a mask to select only the data assigned to a single cluster. More... | |
LALInferenceKmeans * | LALInferenceKmeansExtractCluster (LALInferenceKmeans *kmeans, INT4 cluster_id) |
Extract a single cluster from an existing kmeans as a new 1-means. More... | |
void | LALInferenceKmeansImposeBounds (LALInferenceKmeans *kmeans, LALInferenceVariables *params, LALInferenceVariables *priorArgs, INT4 cyclic_reflective) |
Impose boundaries on individual KDEs. More... | |
gsl_matrix * | mask_data (gsl_matrix *data, INT4 *mask) |
Generate a new matrix by masking an existing one. More... | |
void | accumulate_vectors (gsl_matrix **mat_ptr, gsl_vector *vec) |
Add a vector to the end of a matrix, allocating if necessary. More... | |
void | append_vec_to_mat (gsl_matrix **mat_ptr, gsl_vector *vec) |
Add a vector to the end of a matrix. More... | |
REAL8 * | LALInferenceKmeansDraw (LALInferenceKmeans *kmeans) |
Draw a sample from a kmeans-KDE estimate of a distribution. More... | |
REAL8 | LALInferenceKmeansPDF (LALInferenceKmeans *kmeans, REAL8 *pt) |
Evaluate the estimated (log) PDF from a clustered-KDE at a point. More... | |
REAL8 | LALInferenceWhitenedKmeansPDF (LALInferenceKmeans *kmeans, REAL8 *pt) |
Estimate (log) PDF from a clustered-KDE at an already whitened point. More... | |
gsl_matrix * | LALInferenceWhitenSamples (gsl_matrix *samples) |
Transform a data set to obtain a 0-mean and identity covariance matrix. More... | |
REAL8 | euclidean_dist_squared (gsl_vector *x, gsl_vector *y) |
Calculate the squared Euclidean distance bewteen two points. More... | |
void | euclidean_centroid (gsl_vector *centroid, gsl_matrix *data, INT4 *mask) |
Find the centroid of a masked data set. More... | |
void | LALInferenceKmeansBuildKDE (LALInferenceKmeans *kmeans) |
Build the kernel density estimate from a kmeans clustering. More... | |
REAL8 | LALInferenceKmeansMaxLogL (LALInferenceKmeans *kmeans) |
Calculate max likelihood of a kmeans assuming spherical Gaussian clusters. More... | |
REAL8 | LALInferenceKmeansBIC (LALInferenceKmeans *kmeans) |
Calculate the BIC using the KDEs of the cluster distributions. More... | |
void | LALInferenceKmeansDestroy (LALInferenceKmeans *kmeans) |
Destroy a kmeans instance. More... | |
Go to the source code of this file.
Data Structures | |
struct | LALInferenceKmeans |
Structure for performing the kmeans clustering algorithm on a set of samples. More... | |
LALInferenceKmeans * LALInferenceIncrementalKmeans | ( | gsl_matrix * | data, |
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Kmeans cluster data, increasing k until the BIC stops increasing.
Run the kmeans clustering algorithm incrementally, calculating the Bayes Information Criteria (BIC) at each step, stopping when the BIC stops increasing. The BIC is calculated by estimating the distribution in each cluster with a Gaussian kernel density estimate, then weighting that clusters contribution to the likelihood by the fraction of total samples in that cluster.
[in] | data | A GSL matrix containing the data to be clustered. |
[in] | ntrials | Number of kmeans attempts at fixed k to find optimal BIC. |
[in] | rng | A GSL random number generator used by the kmeans algorithm. |
Definition at line 65 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceOptimizedKmeans | ( | gsl_matrix * | data, |
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Kmeans cluster data, maximizing BIC assuming BIC(k) is concave-down.
Run the kmeans clustering algorithm, searching for the k that maximizes the Bayes Information Criteria (BIC). The BIC is calculated by estimating the distribution in each cluster with a Gaussian kernel density estimate, then weighing that clusters contribution to the likelihood by the fraction of total samples in that cluster.
[in] | data | A GSL matrix containing the data to be clustered. |
[in] | ntrials | Number of kmeans attempts at fixed k to find optimal BIC. |
[in] | rng | A GSL random number generator used by the kmeans algorithm. |
Definition at line 115 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceXmeans | ( | gsl_matrix * | data, |
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Cluster data xmeans-style, splitting centroids according to the BIC.
Run an xmeans-like clustering, increasing a kmeans clustering starting from k=1 by splitting each centroid individually and checking for an increase of the BIC over the parent cluster.
[in] | data | A GSL matrix containing the data to be clustered. |
[in] | ntrials | Number of kmeans attempts at fixed k to find optimal BIC. |
[in] | rng | A GSL random number generator used by the kmeans algorithm. |
Definition at line 264 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceRecursiveKmeans | ( | gsl_matrix * | data, |
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Run kmeans, recursively splitting centroids, accepting if the BIC increases.
Run kmeans recursively, splitting each centroid in two recursively. At each step, the BIC is checked before and after splitting. If the split increases the BIC, each centroid is then attempted to be split. If the BIC decreases, the split is rejected and the centroid is kept. This is less precise than other methods, but much faster, as the likelihood is computed at few and fewer points as centroids are split.
[in] | data | A GSL matrix containing the data to be clustered. |
[in] | ntrials | Number of kmeans attempts at fixed k to find optimal BIC. |
[in] | rng | A GSL random number generator used by the kmeans algorithm. |
Definition at line 376 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansRecursiveSplit | ( | LALInferenceKmeans * | kmeans, |
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Recursively split a k=1 kmeans.
Split a centroid in 2 recursively, evaluating the BIC to determine if the split is kept. This is a cheap method, as the BIC becomes cheaper to compute after each split.
[in] | kmeans | A k=1 kmeans to be split recursively. |
[in] | ntrials | Number of kmeans attempts at fixed k to find optimal BIC. |
[in] | rng | A GSL random number generator used by the kmeans algorithm. |
Definition at line 417 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansRun | ( | LALInferenceKmeans * | kmeans | ) |
Run the kmeans algorithm until cluster assignments don't change.
Starting with some random initialization, points are assigned to the closest centroids, then the centroids are calculated of the new cluster. This is repeated until the assignments stop changing.
kmeans | The initialized kmeans to run. |
Definition at line 469 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceKmeansRunBestOf | ( | INT4 | k, |
gsl_matrix * | samples, | ||
INT4 | ntrials, | ||
gsl_rng * | rng | ||
) |
Run a kmeans several times and return the best.
The kmeans is run ntrials times, each time with a new random initialization. The one that results in the highest Bayes Information Criteria (BIC) is returned.
[in] | k | The number of clusters to use. |
[in] | samples | The (unwhitened) data to cluster. |
[in] | ntrials | The number of random initialization to run. |
[in] | rng | The GSL random number generator to use for random initializations. |
Definition at line 497 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceCreateKmeans | ( | INT4 | k, |
gsl_matrix * | data, | ||
gsl_rng * | rng | ||
) |
Generate a new kmeans struct from a set of data.
Whitens data and fills a newly allocated kmeans structure of the requested size.
[in] | k | The requested number of clusters. |
[in] | data | The unwhitened data to be clustered. |
[in] | rng | A GSL random number generator used for initialization. |
Definition at line 558 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansSeededInitialize | ( | LALInferenceKmeans * | kmeans | ) |
A 'k-means++'-like seeded initialization of centroids for a kmeans run.
kmeans | The kmeans to initialize the centroids of. |
Definition at line 617 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansForgyInitialize | ( | LALInferenceKmeans * | kmeans | ) |
Randomly select points from the data as initial centroids for a kmeans run.
kmeans | The kmeans to initialize the centroids of. |
Definition at line 673 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansRandomPartitionInitialize | ( | LALInferenceKmeans * | kmeans | ) |
Use the resulting centroids from a random cluster assignment as the initial centroids of a kmeans run.
kmeans | The kmeans to initialize the centroids of. |
Definition at line 697 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansAssignment | ( | LALInferenceKmeans * | kmeans | ) |
The assignment step of the kmeans algorithm.
Assign all data to the closest centroid and calculate the error, defined as the cumulative sum of the distance between all points and their closest centroid.
kmeans | The kmeans to perform the assignment step on. |
Definition at line 718 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansUpdate | ( | LALInferenceKmeans * | kmeans | ) |
The update step of the kmeans algorithm.
Based on the current assignments, calculate the new centroid of each cluster.
kmeans | The kmeans to perform the update step on. |
Definition at line 766 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansConstructMask | ( | LALInferenceKmeans * | kmeans, |
INT4 * | mask, | ||
INT4 | cluster_id | ||
) |
Construct a mask to select only the data assigned to a single cluster.
Contruct a mask that selects the data from kmeans assigned to the centroid indexed in the centroid list by centroid_id.
[in] | kmeans | The kmeans clustering to mask the data of. |
[out] | mask | The mask that will select points assigned to cluster_id in kmeans. |
[in] | cluster_id | The index of the centroid in kmeans to choose points that are assigned to. |
Definition at line 790 of file LALInferenceClusteredKDE.c.
LALInferenceKmeans * LALInferenceKmeansExtractCluster | ( | LALInferenceKmeans * | kmeans, |
INT4 | cluster_id | ||
) |
Extract a single cluster from an existing kmeans as a new 1-means.
Given the index of the centroid requested, generate a new 1-means structure containing only the points assigned to that cluster.
[in] | kmeans | The parent kmeans to extract the cluster from. |
[in] | cluster_id | The index of the centroid in kmeans corresponding to the cluster to extract. |
Definition at line 814 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansImposeBounds | ( | LALInferenceKmeans * | kmeans, |
LALInferenceVariables * | params, | ||
LALInferenceVariables * | priorArgs, | ||
INT4 | cyclic_reflective | ||
) |
Impose boundaries on individual KDEs.
Draw samples from each cluster. If too many samples lie outside of the prior, impose a cyclic/reflective bound for the offending parameter(s) if requested. If boundaries aren't incountered, box in the cluster using the samples drawn. This avoids needing to evaluate KDEs that are too far away.
[in] | kmeans | kmeans to cycle through the clusters of. |
[in] | params | Parameters to impose bounds on. |
[in] | priorArgs | Variables containing prior boundaries. |
[in] | cyclic_reflective | Flag to check for cyclic/reflective bounds. |
Definition at line 851 of file LALInferenceClusteredKDE.c.
gsl_matrix * mask_data | ( | gsl_matrix * | data, |
INT4 * | mask | ||
) |
Generate a new matrix by masking an existing one.
Allocate and fill a new GSL matrix by masking an existing one.
[in] | data | GSL matrix whose rows will be masked. |
[in] | mask | 0/1 array with length equal to the number of rows in data. |
Definition at line 955 of file LALInferenceClusteredKDE.c.
void accumulate_vectors | ( | gsl_matrix ** | mat_ptr, |
gsl_vector * | vec | ||
) |
Add a vector to the end of a matrix, allocating if necessary.
Utility for acculating vectors in a GSL matrix, allocating if its the first vector to be accumulated.
[in] | mat_ptr | Pointer to the GSL matrix to append to. |
[in] | vec | The vector to append to the matrix pointed to by mat_ptr. |
Definition at line 996 of file LALInferenceClusteredKDE.c.
void append_vec_to_mat | ( | gsl_matrix ** | mat_ptr, |
gsl_vector * | vec | ||
) |
Add a vector to the end of a matrix.
Utility for acculating vectors in a GSL matrix.
[in] | mat_ptr | Pointer to the GSL matrix to append to. |
[in] | vec | The vector to append to the matrix pointed to by mat_ptr. |
Definition at line 1016 of file LALInferenceClusteredKDE.c.
REAL8 * LALInferenceKmeansDraw | ( | LALInferenceKmeans * | kmeans | ) |
Draw a sample from a kmeans-KDE estimate of a distribution.
Draw a random sample from the estimated distribution of a set of points. A cluster from kmeans is picked at random, from which a sample is drawn from the KDE of the cluster.
[in] | kmeans | The kmeans to draw a sample from. |
Definition at line 1046 of file LALInferenceClusteredKDE.c.
REAL8 LALInferenceKmeansPDF | ( | LALInferenceKmeans * | kmeans, |
REAL8 * | pt | ||
) |
Evaluate the estimated (log) PDF from a clustered-KDE at a point.
Compute the (log) value of the estimated probability density function from the clustered kernel density estimate of a distribution.
[in] | kmeans | The kmeans clustering to estimate the PDF from. |
[in] | pt | Array containing the point to evaluate the distribution at. |
Definition at line 1079 of file LALInferenceClusteredKDE.c.
REAL8 LALInferenceWhitenedKmeansPDF | ( | LALInferenceKmeans * | kmeans, |
REAL8 * | pt | ||
) |
Estimate (log) PDF from a clustered-KDE at an already whitened point.
Calculate the probability at a point from the clustered-KDE estimate, assuming the point has already been whitened using the same process as the stored data in kmeans. This is particularly useful when evaluating the BIC during clustering.
[in] | kmeans | The kmeans clustering to estimate the PDF from. |
[in] | pt | Array containing the point to evaluate the distribution at. |
Definition at line 1111 of file LALInferenceClusteredKDE.c.
gsl_matrix * LALInferenceWhitenSamples | ( | gsl_matrix * | samples | ) |
Transform a data set to obtain a 0-mean and identity covariance matrix.
Determine and execute the transformation of a data set to remove any global correlations in a data set.
[in] | samples | The data set to whiten. |
Definition at line 1136 of file LALInferenceClusteredKDE.c.
REAL8 euclidean_dist_squared | ( | gsl_vector * | x, |
gsl_vector * | y | ||
) |
Calculate the squared Euclidean distance bewteen two points.
[in] | x | A GSL vector. |
[in] | y | Another GSL vector. |
Definition at line 1176 of file LALInferenceClusteredKDE.c.
void euclidean_centroid | ( | gsl_vector * | centroid, |
gsl_matrix * | data, | ||
INT4 * | mask | ||
) |
Find the centroid of a masked data set.
[out] | centroid | The newly determined centroid. |
[in] | data | Set of points to determin centroid from. |
[in] | mask | Mask to select which samples in data to calculate the centroid of. |
Definition at line 1197 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansBuildKDE | ( | LALInferenceKmeans * | kmeans | ) |
Build the kernel density estimate from a kmeans clustering.
Given the current clustering, estimate the distribution of each cluster using a kernel density estimator. This will be used to evaluate the Bayes Information Criteria when deciding the optimal clustering.
kmeans | The kmeans to estimate the cluster distributions of. |
Definition at line 1226 of file LALInferenceClusteredKDE.c.
REAL8 LALInferenceKmeansMaxLogL | ( | LALInferenceKmeans * | kmeans | ) |
Calculate max likelihood of a kmeans assuming spherical Gaussian clusters.
Determine the maximum likelihood estimate (MLE) of the clustering assuming each cluster is drawn from a spherical Gaussian. This is not currently used, but is the original MLE used by the xmeans clustering algorithm.
kmeans | The kmeans to determine the BIC of. |
Definition at line 1248 of file LALInferenceClusteredKDE.c.
REAL8 LALInferenceKmeansBIC | ( | LALInferenceKmeans * | kmeans | ) |
Calculate the BIC using the KDEs of the cluster distributions.
Use the kernal density estimate of each clusters distribution to calculate the Bayes Information Criterian (BIC). The BIC penalizes the maximum likelihood of a given model based on the number of parameters of the model. This is used to deside the optimal clustering.
kmeans | The kmeans to determine the BIC of. |
Definition at line 1296 of file LALInferenceClusteredKDE.c.
void LALInferenceKmeansDestroy | ( | LALInferenceKmeans * | kmeans | ) |
Destroy a kmeans instance.
Systematically deallocate all memory associated to kmeans.
kmeans | The kmeans instance to deallocate. |
Definition at line 1335 of file LALInferenceClusteredKDE.c.