Private:hpc-discussion

From NMSL

Back to the HPC main page

Implementing kernel matrix approximation algorithm of [HA09] on a multi-core CPU

Advantages:

  • Faster kernel matrix computation than brute-force on the CPU
  • Handling larger data sets by using the sparse representation of [HA09]

Possible improvements:

  • Using Hilbert curves instead of the Z-curve


Fast support vector machines

  • Use the data reordering principal as in [HA09]
  • Use locality sensitive hashing to reorder data in the input space
  • Compute the kernel matrix using the new data reordering The kernel matrix can be efficiently computed in parallel on multiple cores
  • Note: It is sufficient to obtain sub-optimal data reordering because it is only used to create the band kernel matrix
  • Use QR decomposition to compute the eigen decomposition and regularize the kernel matrix
  • Use the standard sequential minimal optimization (SMO) to compute the support vectors

Advantages:

  • Faster SVMs
  • Handling larger data sets

Possible improvements:

  • If we use kernel functions with compact support, can we relate the kernel bandwidth to the neighborhood size of LSH? If we can do that, we might end up with a symmetric, positive, semi-definite kernel matrix that does not need further regularization


Exa-scale support vector machines

  • The algorithm to be developed in (2) lends itself to implementation on high performance clusters Compute the new data reordering on a master node
  • Divide the kernel matrix into blocks and compute each block on a separate node, based on number of nodes

size of data memory available on each node Run SMO on the master node and use message passing to communicate required values

Advantages

  • natural scalability to increasing data sizes