Data Shaping Solutions
Analytic Social Network - Data Mining Directory
Site Map
 
 [ Home
 [ Finance ]  
 [ Web Audit ] 
 [ Consulting
 

Statistical Software

  1. Robust Multivariate Ridge and Linear Regression with Bootstrap

  2. Stock Market Simulator

  3. Simulation of Clustered Data

    • Produces simulated clusters.
    • Description: The seed routine creates a cluster of 1000 points, saved in cluster.txt: each row corresponds to a point; the first column is the cluster number, and the next two columns are the x and y coordinates. The cluster number is automatically incremented each time a new call to seed is made, resulting in the creation of a new cluster. The distance routine computes the distance between two points, for 100 points randomly selected in the data set previously created (cluster.txt). The output is a file dist.txt, with one row per pair of points, with two fields: the first column is an indicator and is equal to 1 if both points belong to the same cluster; the second column is the distance between the two points. This script illustrates how to check whether a data set contains one or two clusters by looking at the distribution of distances: a gap in the distribution means the presence of distinct clusters. It also suggests that the computational complexity of computing whether a data set contains one of more clusters is well below O(n), possibly O(n0.5), if one uses sampling techniques.
    • Perl source code (77 lines)


 
Google