Private:hpc-discussion
From NMSL
Back to the HPC main page
Implementing kernel matrix approximation algorithm of [HA09] on a multi-core CPU
Advantages:
- Faster kernel matrix computation than brute-force on the CPU
- Handling larger data sets by using the sparse representation of [HA09]
Possible improvements:
- Using Hilbert curves instead of the Z-curve
Fast support vector machines
- Use the data reordering principal as in [HA09]
- Use locality sensitive hashing to reorder data in the input space
- Compute the kernel matrix using the new data reordering The kernel matrix can be efficiently computed in parallel on multiple cores
- Note: It is sufficient to obtain sub-optimal data reordering because it is only used to create the band kernel matrix
- Use QR decomposition to compute the eigen decomposition and regularize the kernel matrix
- Use the standard sequential minimal optimization (SMO) to compute the support vectors
Advantages:
- Faster SVMs
- Handling larger data sets
Possible improvements:
- If we use kernel functions with compact support, can we relate the kernel bandwidth to the neighborhood size of LSH? If we can do that, we might end up with a symmetric, positive, semi-definite kernel matrix that does not need further regularization
Exa-scale support vector machines
- The algorithm to be developed in (2) lends itself to implementation on high performance clusters Compute the new data reordering on a master node
- Divide the kernel matrix into blocks and compute each block on a separate node, based on number of nodes
size of data memory available on each node Run SMO on the master node and use message passing to communicate required values
Advantages
- natural scalability to increasing data sizes