1/19/2010

- Learn to run CUDA
- Get something running by Friday - DONE
- Look up how to get uniform random number generator working
- Look at matlab docs on rand()
- Tuesday, explain my understanding of random number generation>
- Read the random number paper

- Get randG working in standard python by Feb 1

- Met with Dr. Wilson, received new targets
- Implement a for loop in cuda where some threads run longer than others

Calculate the ith fibonacci number where i is the number of the ith processor - Study and then explain the CUDA programming model to Dr. Wilson
- Investigate running a non-thread-safe random number generator in the parallel CUDA environment
- Look at the Mersenne Twister method and see how it can be used in parallel effectively

- Found two good resources:

CUDA programming guide

Python random number generation - Determined that the Mersenne Twister can be used in multi-threaded environment, we must simply use jumpahead() (which changes the seed) to make the generated sequences in each thread less likely to coincide. It will require that each thread have an instance of a python random object.
- Reading the programming guide currently, I will look into further methods of thread-safe random number generation.

- I determined that I was mistaken, the implementation in Python is the Wichman-hill random number generator, not the Mersenne Twister
- This method is not thread-safe because it is re-seeded on each iteration. In a parallel environment, all threads will get the same seed. This is overcome by calling jumpahead(k). This function simply re-seeds the number generator by acting as if k additional random numbers have been generated.
- The size and dimension of thread blocks in CUDA are user specified data for a given function call. You must specify the grid size, the block size and the number of bytes of dynamically allocated block-level shared memory that are required.
- Given the above information, I am able to display id's for the current thread and block, but not for a given processor.
- I have been unable to produce the code for this at the moment because I cannot remote into my CUDA machine, will talk with Ken tomorrow.

- Here is one reference on generating random numbers in parallel: click here
- Click here for presentation on pycuda
- Read through pycuda docs, worked on some examples
- Got Fibo number generator working
- Got issues with lin467-09 resolved, pycuda is working now
- One issue with the leapfrog approach is the period becomes bounded by P/n (where P is the previous period and n is the number of random number streams required). Not sure if this will be an issue.
- Several methods for parallel random number generation: here
- Todo

Understand usage of the random numbers, how will the code be called?

What will the output the random generator be? An array? A single number?

- Read through Dr. Zare's paper
- Searching for references for Mersenne Twister in parallel
- Found and read reference for creating parallel Mersenne Twister
- Found reference for creating normal distribution from Mersenne Twister
- Going to implement simple Mersenne Twister in pycuda

- Read paper on reliable creation of parallel mersenne twister
- Began implementing this
- Have found C code that does the trick, however I must wrap it and call it from python
- Have figured out how this will work, just need to figure out how to wrap calls/objects in python

- Have read Cython tutorials
- Problem with python, need Ken to resolve, don't have permissions
- Presently writing wrapper for C library
- The next number is generated thusly: Xk+n = Xk+m + ( upperR(Xk) | lowerR(Xk+1) ) * A

where A is called a twister matrix. - Have found Box-Muller transformation, If the output of the Mersenne Twister is uniformly distributed

I can use this to convert it to a normal distribution. reference

- Written code to wrap C library calls
- Have written Pycuda code to call this and array of structs

is being produced correctly - Have one bug in struct, in that an array is not being copied

with the struct, just the pointer. Must fix.

- Have fixed the array bug
- Need to determine what a proper seed for this would be
- Need to handle memory cleanup to prevent leak

- Have made mersenne module which creates structures and returns appropriate kernel
- Have not been able to separate modules, they must be in same SourceModule
- Have written code for gaussian distribution, however I need to figure out how to make range (0,1] for

the input mersenne numbers - Need to determine a method for creating a gamma distribution from normal, or mersenne numbers

- Implemented gaussian and gamma distributions

Used marsaglia paper for gamma and box-muller for gaussian - Need to determine a way to validate the output of my gauss and gamma functions

- Approach for matrix mult is to use blocks of threads
each block calculates a block_sizexblock_size sub-matrix.

this reduces memory traffic. - Need to determine block_size, possibly 16x16 or 8x8 will

try both - The number of blocks = (Width of matrix^2 + block_size-1) / block_size
- I found a good approach for solving the matrix mult problem here
- Have begun writing matrix mult code.