Loading [MathJax]/extensions/TeX/AMSsymbols.js
LALInference 4.1.9.1-5e288d3
All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Modules Pages
LALInferenceClusteredKDE.h File Reference

Prototypes

LALInferenceKmeansLALInferenceIncrementalKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng)
 Kmeans cluster data, increasing k until the BIC stops increasing. More...
 
LALInferenceKmeansLALInferenceOptimizedKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng)
 Kmeans cluster data, maximizing BIC assuming BIC(k) is concave-down. More...
 
LALInferenceKmeansLALInferenceXmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng)
 Cluster data xmeans-style, splitting centroids according to the BIC. More...
 
LALInferenceKmeansLALInferenceRecursiveKmeans (gsl_matrix *data, INT4 ntrials, gsl_rng *rng)
 Run kmeans, recursively splitting centroids, accepting if the BIC increases. More...
 
void LALInferenceKmeansRecursiveSplit (LALInferenceKmeans *kmeans, INT4 ntrials, gsl_rng *rng)
 Recursively split a k=1 kmeans. More...
 
void LALInferenceKmeansRun (LALInferenceKmeans *kmeans)
 Run the kmeans algorithm until cluster assignments don't change. More...
 
LALInferenceKmeansLALInferenceKmeansRunBestOf (INT4 k, gsl_matrix *samples, INT4 ntrials, gsl_rng *rng)
 Run a kmeans several times and return the best. More...
 
LALInferenceKmeansLALInferenceCreateKmeans (INT4 k, gsl_matrix *data, gsl_rng *rng)
 Generate a new kmeans struct from a set of data. More...
 
void LALInferenceKmeansSeededInitialize (LALInferenceKmeans *kmeans)
 A 'k-means++'-like seeded initialization of centroids for a kmeans run. More...
 
void LALInferenceKmeansForgyInitialize (LALInferenceKmeans *kmeans)
 Randomly select points from the data as initial centroids for a kmeans run. More...
 
void LALInferenceKmeansRandomPartitionInitialize (LALInferenceKmeans *kmeans)
 Use the resulting centroids from a random cluster assignment as the initial centroids of a kmeans run. More...
 
void LALInferenceKmeansAssignment (LALInferenceKmeans *kmeans)
 The assignment step of the kmeans algorithm. More...
 
void LALInferenceKmeansUpdate (LALInferenceKmeans *kmeans)
 The update step of the kmeans algorithm. More...
 
void LALInferenceKmeansConstructMask (LALInferenceKmeans *kmeans, INT4 *mask, INT4 cluster_id)
 Construct a mask to select only the data assigned to a single cluster. More...
 
LALInferenceKmeansLALInferenceKmeansExtractCluster (LALInferenceKmeans *kmeans, INT4 cluster_id)
 Extract a single cluster from an existing kmeans as a new 1-means. More...
 
void LALInferenceKmeansImposeBounds (LALInferenceKmeans *kmeans, LALInferenceVariables *params, LALInferenceVariables *priorArgs, INT4 cyclic_reflective)
 Impose boundaries on individual KDEs. More...
 
gsl_matrix * mask_data (gsl_matrix *data, INT4 *mask)
 Generate a new matrix by masking an existing one. More...
 
void accumulate_vectors (gsl_matrix **mat_ptr, gsl_vector *vec)
 Add a vector to the end of a matrix, allocating if necessary. More...
 
void append_vec_to_mat (gsl_matrix **mat_ptr, gsl_vector *vec)
 Add a vector to the end of a matrix. More...
 
REAL8LALInferenceKmeansDraw (LALInferenceKmeans *kmeans)
 Draw a sample from a kmeans-KDE estimate of a distribution. More...
 
REAL8 LALInferenceKmeansPDF (LALInferenceKmeans *kmeans, REAL8 *pt)
 Evaluate the estimated (log) PDF from a clustered-KDE at a point. More...
 
REAL8 LALInferenceWhitenedKmeansPDF (LALInferenceKmeans *kmeans, REAL8 *pt)
 Estimate (log) PDF from a clustered-KDE at an already whitened point. More...
 
gsl_matrix * LALInferenceWhitenSamples (gsl_matrix *samples)
 Transform a data set to obtain a 0-mean and identity covariance matrix. More...
 
REAL8 euclidean_dist_squared (gsl_vector *x, gsl_vector *y)
 Calculate the squared Euclidean distance bewteen two points. More...
 
void euclidean_centroid (gsl_vector *centroid, gsl_matrix *data, INT4 *mask)
 Find the centroid of a masked data set. More...
 
void LALInferenceKmeansBuildKDE (LALInferenceKmeans *kmeans)
 Build the kernel density estimate from a kmeans clustering. More...
 
REAL8 LALInferenceKmeansMaxLogL (LALInferenceKmeans *kmeans)
 Calculate max likelihood of a kmeans assuming spherical Gaussian clusters. More...
 
REAL8 LALInferenceKmeansBIC (LALInferenceKmeans *kmeans)
 Calculate the BIC using the KDEs of the cluster distributions. More...
 
void LALInferenceKmeansDestroy (LALInferenceKmeans *kmeans)
 Destroy a kmeans instance. More...
 

Go to the source code of this file.

Data Structures

struct  LALInferenceKmeans
 Structure for performing the kmeans clustering algorithm on a set of samples. More...
 

Function Documentation

◆ LALInferenceIncrementalKmeans()

LALInferenceKmeans * LALInferenceIncrementalKmeans ( gsl_matrix *  data,
INT4  ntrials,
gsl_rng *  rng 
)

Kmeans cluster data, increasing k until the BIC stops increasing.

Run the kmeans clustering algorithm incrementally, calculating the Bayes Information Criteria (BIC) at each step, stopping when the BIC stops increasing. The BIC is calculated by estimating the distribution in each cluster with a Gaussian kernel density estimate, then weighting that clusters contribution to the likelihood by the fraction of total samples in that cluster.

Parameters
[in]dataA GSL matrix containing the data to be clustered.
[in]ntrialsNumber of kmeans attempts at fixed k to find optimal BIC.
[in]rngA GSL random number generator used by the kmeans algorithm.
Returns
A kmeans structure containing the clustering that maximizes the BIC.

Definition at line 65 of file LALInferenceClusteredKDE.c.

◆ LALInferenceOptimizedKmeans()

LALInferenceKmeans * LALInferenceOptimizedKmeans ( gsl_matrix *  data,
INT4  ntrials,
gsl_rng *  rng 
)

Kmeans cluster data, maximizing BIC assuming BIC(k) is concave-down.

Run the kmeans clustering algorithm, searching for the k that maximizes the Bayes Information Criteria (BIC). The BIC is calculated by estimating the distribution in each cluster with a Gaussian kernel density estimate, then weighing that clusters contribution to the likelihood by the fraction of total samples in that cluster.

Parameters
[in]dataA GSL matrix containing the data to be clustered.
[in]ntrialsNumber of kmeans attempts at fixed k to find optimal BIC.
[in]rngA GSL random number generator used by the kmeans algorithm.
Returns
A kmeans structure containing the clustering that maximizes the BIC.

Definition at line 115 of file LALInferenceClusteredKDE.c.

◆ LALInferenceXmeans()

LALInferenceKmeans * LALInferenceXmeans ( gsl_matrix *  data,
INT4  ntrials,
gsl_rng *  rng 
)

Cluster data xmeans-style, splitting centroids according to the BIC.

Run an xmeans-like clustering, increasing a kmeans clustering starting from k=1 by splitting each centroid individually and checking for an increase of the BIC over the parent cluster.

Parameters
[in]dataA GSL matrix containing the data to be clustered.
[in]ntrialsNumber of kmeans attempts at fixed k to find optimal BIC.
[in]rngA GSL random number generator used by the kmeans algorithm.
Returns
A kmeans structure containing the clustering that maximizes the BIC.

Definition at line 264 of file LALInferenceClusteredKDE.c.

◆ LALInferenceRecursiveKmeans()

LALInferenceKmeans * LALInferenceRecursiveKmeans ( gsl_matrix *  data,
INT4  ntrials,
gsl_rng *  rng 
)

Run kmeans, recursively splitting centroids, accepting if the BIC increases.

Run kmeans recursively, splitting each centroid in two recursively. At each step, the BIC is checked before and after splitting. If the split increases the BIC, each centroid is then attempted to be split. If the BIC decreases, the split is rejected and the centroid is kept. This is less precise than other methods, but much faster, as the likelihood is computed at few and fewer points as centroids are split.

Parameters
[in]dataA GSL matrix containing the data to be clustered.
[in]ntrialsNumber of kmeans attempts at fixed k to find optimal BIC.
[in]rngA GSL random number generator used by the kmeans algorithm.
Returns
A kmeans structure containing the clustering that maximizes the BIC.

Definition at line 376 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansRecursiveSplit()

void LALInferenceKmeansRecursiveSplit ( LALInferenceKmeans kmeans,
INT4  ntrials,
gsl_rng *  rng 
)

Recursively split a k=1 kmeans.

Split a centroid in 2 recursively, evaluating the BIC to determine if the split is kept. This is a cheap method, as the BIC becomes cheaper to compute after each split.

Parameters
[in]kmeansA k=1 kmeans to be split recursively.
[in]ntrialsNumber of kmeans attempts at fixed k to find optimal BIC.
[in]rngA GSL random number generator used by the kmeans algorithm.
See also
LALInferenceRecursiveKmeans()

Definition at line 417 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansRun()

void LALInferenceKmeansRun ( LALInferenceKmeans kmeans)

Run the kmeans algorithm until cluster assignments don't change.

Starting with some random initialization, points are assigned to the closest centroids, then the centroids are calculated of the new cluster. This is repeated until the assignments stop changing.

Parameters
kmeansThe initialized kmeans to run.

Definition at line 469 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansRunBestOf()

LALInferenceKmeans * LALInferenceKmeansRunBestOf ( INT4  k,
gsl_matrix *  samples,
INT4  ntrials,
gsl_rng *  rng 
)

Run a kmeans several times and return the best.

The kmeans is run ntrials times, each time with a new random initialization. The one that results in the highest Bayes Information Criteria (BIC) is returned.

Parameters
[in]kThe number of clusters to use.
[in]samplesThe (unwhitened) data to cluster.
[in]ntrialsThe number of random initialization to run.
[in]rngThe GSL random number generator to use for random initializations.
Returns
The kmeans with the highest BIC of ntrials attempts.

Definition at line 497 of file LALInferenceClusteredKDE.c.

◆ LALInferenceCreateKmeans()

LALInferenceKmeans * LALInferenceCreateKmeans ( INT4  k,
gsl_matrix *  data,
gsl_rng *  rng 
)

Generate a new kmeans struct from a set of data.

Whitens data and fills a newly allocated kmeans structure of the requested size.

Parameters
[in]kThe requested number of clusters.
[in]dataThe unwhitened data to be clustered.
[in]rngA GSL random number generator used for initialization.

Definition at line 558 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansSeededInitialize()

void LALInferenceKmeansSeededInitialize ( LALInferenceKmeans kmeans)

A 'k-means++'-like seeded initialization of centroids for a kmeans run.

Parameters
kmeansThe kmeans to initialize the centroids of.

Definition at line 617 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansForgyInitialize()

void LALInferenceKmeansForgyInitialize ( LALInferenceKmeans kmeans)

Randomly select points from the data as initial centroids for a kmeans run.

Parameters
kmeansThe kmeans to initialize the centroids of.

Definition at line 673 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansRandomPartitionInitialize()

void LALInferenceKmeansRandomPartitionInitialize ( LALInferenceKmeans kmeans)

Use the resulting centroids from a random cluster assignment as the initial centroids of a kmeans run.

Parameters
kmeansThe kmeans to initialize the centroids of.

Definition at line 697 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansAssignment()

void LALInferenceKmeansAssignment ( LALInferenceKmeans kmeans)

The assignment step of the kmeans algorithm.

Assign all data to the closest centroid and calculate the error, defined as the cumulative sum of the distance between all points and their closest centroid.

Parameters
kmeansThe kmeans to perform the assignment step on.

Definition at line 718 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansUpdate()

void LALInferenceKmeansUpdate ( LALInferenceKmeans kmeans)

The update step of the kmeans algorithm.

Based on the current assignments, calculate the new centroid of each cluster.

Parameters
kmeansThe kmeans to perform the update step on.

Definition at line 766 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansConstructMask()

void LALInferenceKmeansConstructMask ( LALInferenceKmeans kmeans,
INT4 mask,
INT4  cluster_id 
)

Construct a mask to select only the data assigned to a single cluster.

Contruct a mask that selects the data from kmeans assigned to the centroid indexed in the centroid list by centroid_id.

Parameters
[in]kmeansThe kmeans clustering to mask the data of.
[out]maskThe mask that will select points assigned to cluster_id in kmeans.
[in]cluster_idThe index of the centroid in kmeans to choose points that are assigned to.

Definition at line 790 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansExtractCluster()

LALInferenceKmeans * LALInferenceKmeansExtractCluster ( LALInferenceKmeans kmeans,
INT4  cluster_id 
)

Extract a single cluster from an existing kmeans as a new 1-means.

Given the index of the centroid requested, generate a new 1-means structure containing only the points assigned to that cluster.

Parameters
[in]kmeansThe parent kmeans to extract the cluster from.
[in]cluster_idThe index of the centroid in kmeans corresponding to the cluster to extract.
Returns
A kmeans with k=1 containing only the points assigned to cluster cluster_id from kmeans.

Definition at line 814 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansImposeBounds()

void LALInferenceKmeansImposeBounds ( LALInferenceKmeans kmeans,
LALInferenceVariables params,
LALInferenceVariables priorArgs,
INT4  cyclic_reflective 
)

Impose boundaries on individual KDEs.

Draw samples from each cluster. If too many samples lie outside of the prior, impose a cyclic/reflective bound for the offending parameter(s) if requested. If boundaries aren't incountered, box in the cluster using the samples drawn. This avoids needing to evaluate KDEs that are too far away.

Parameters
[in]kmeanskmeans to cycle through the clusters of.
[in]paramsParameters to impose bounds on.
[in]priorArgsVariables containing prior boundaries.
[in]cyclic_reflectiveFlag to check for cyclic/reflective bounds.

Definition at line 851 of file LALInferenceClusteredKDE.c.

◆ mask_data()

gsl_matrix * mask_data ( gsl_matrix *  data,
INT4 mask 
)

Generate a new matrix by masking an existing one.

Allocate and fill a new GSL matrix by masking an existing one.

Parameters
[in]dataGSL matrix whose rows will be masked.
[in]mask0/1 array with length equal to the number of rows in data.
Returns
A new GSL matrix constaining only the masked data.

Definition at line 955 of file LALInferenceClusteredKDE.c.

◆ accumulate_vectors()

void accumulate_vectors ( gsl_matrix **  mat_ptr,
gsl_vector *  vec 
)

Add a vector to the end of a matrix, allocating if necessary.

Utility for acculating vectors in a GSL matrix, allocating if its the first vector to be accumulated.

Parameters
[in]mat_ptrPointer to the GSL matrix to append to.
[in]vecThe vector to append to the matrix pointed to by mat_ptr.
See also
append_vec_to_mat()

Definition at line 996 of file LALInferenceClusteredKDE.c.

◆ append_vec_to_mat()

void append_vec_to_mat ( gsl_matrix **  mat_ptr,
gsl_vector *  vec 
)

Add a vector to the end of a matrix.

Utility for acculating vectors in a GSL matrix.

Parameters
[in]mat_ptrPointer to the GSL matrix to append to.
[in]vecThe vector to append to the matrix pointed to by mat_ptr.

Definition at line 1016 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansDraw()

REAL8 * LALInferenceKmeansDraw ( LALInferenceKmeans kmeans)

Draw a sample from a kmeans-KDE estimate of a distribution.

Draw a random sample from the estimated distribution of a set of points. A cluster from kmeans is picked at random, from which a sample is drawn from the KDE of the cluster.

Parameters
[in]kmeansThe kmeans to draw a sample from.
Returns
An array containing a sample drawn from kmeans.

Definition at line 1046 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansPDF()

REAL8 LALInferenceKmeansPDF ( LALInferenceKmeans kmeans,
REAL8 pt 
)

Evaluate the estimated (log) PDF from a clustered-KDE at a point.

Compute the (log) value of the estimated probability density function from the clustered kernel density estimate of a distribution.

Parameters
[in]kmeansThe kmeans clustering to estimate the PDF from.
[in]ptArray containing the point to evaluate the distribution at.
Returns
The estimated value of the PDF at pt.

Definition at line 1079 of file LALInferenceClusteredKDE.c.

◆ LALInferenceWhitenedKmeansPDF()

REAL8 LALInferenceWhitenedKmeansPDF ( LALInferenceKmeans kmeans,
REAL8 pt 
)

Estimate (log) PDF from a clustered-KDE at an already whitened point.

Calculate the probability at a point from the clustered-KDE estimate, assuming the point has already been whitened using the same process as the stored data in kmeans. This is particularly useful when evaluating the BIC during clustering.

Parameters
[in]kmeansThe kmeans clustering to estimate the PDF from.
[in]ptArray containing the point to evaluate the distribution at.
Returns
The estimated value of the PDF at pt.

Definition at line 1111 of file LALInferenceClusteredKDE.c.

◆ LALInferenceWhitenSamples()

gsl_matrix * LALInferenceWhitenSamples ( gsl_matrix *  samples)

Transform a data set to obtain a 0-mean and identity covariance matrix.

Determine and execute the transformation of a data set to remove any global correlations in a data set.

Parameters
[in]samplesThe data set to whiten.
Returns
A whitened data set with samples corresponding to samples.

Definition at line 1136 of file LALInferenceClusteredKDE.c.

◆ euclidean_dist_squared()

REAL8 euclidean_dist_squared ( gsl_vector *  x,
gsl_vector *  y 
)

Calculate the squared Euclidean distance bewteen two points.

Parameters
[in]xA GSL vector.
[in]yAnother GSL vector.
Returns
The squared Euclidean distance between x and y.

Definition at line 1176 of file LALInferenceClusteredKDE.c.

◆ euclidean_centroid()

void euclidean_centroid ( gsl_vector *  centroid,
gsl_matrix *  data,
INT4 mask 
)

Find the centroid of a masked data set.

Parameters
[out]centroidThe newly determined centroid.
[in]dataSet of points to determin centroid from.
[in]maskMask to select which samples in data to calculate the centroid of.

Definition at line 1197 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansBuildKDE()

void LALInferenceKmeansBuildKDE ( LALInferenceKmeans kmeans)

Build the kernel density estimate from a kmeans clustering.

Given the current clustering, estimate the distribution of each cluster using a kernel density estimator. This will be used to evaluate the Bayes Information Criteria when deciding the optimal clustering.

Parameters
kmeansThe kmeans to estimate the cluster distributions of.

Definition at line 1226 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansMaxLogL()

REAL8 LALInferenceKmeansMaxLogL ( LALInferenceKmeans kmeans)

Calculate max likelihood of a kmeans assuming spherical Gaussian clusters.

Determine the maximum likelihood estimate (MLE) of the clustering assuming each cluster is drawn from a spherical Gaussian. This is not currently used, but is the original MLE used by the xmeans clustering algorithm.

Parameters
kmeansThe kmeans to determine the BIC of.
Returns
The maximum likelihood estimate of the kmeans clustering.

Definition at line 1248 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansBIC()

REAL8 LALInferenceKmeansBIC ( LALInferenceKmeans kmeans)

Calculate the BIC using the KDEs of the cluster distributions.

Use the kernal density estimate of each clusters distribution to calculate the Bayes Information Criterian (BIC). The BIC penalizes the maximum likelihood of a given model based on the number of parameters of the model. This is used to deside the optimal clustering.

Parameters
kmeansThe kmeans to determine the BIC of.
Returns
The Bayes Information Criterian of the kmeans clustering.

Definition at line 1296 of file LALInferenceClusteredKDE.c.

◆ LALInferenceKmeansDestroy()

void LALInferenceKmeansDestroy ( LALInferenceKmeans kmeans)

Destroy a kmeans instance.

Systematically deallocate all memory associated to kmeans.

Parameters
kmeansThe kmeans instance to deallocate.

Definition at line 1335 of file LALInferenceClusteredKDE.c.