This is a code implementation release for DOCA, short for Detecting Overlapping Communities Algorithm, from the paper
********************************************************************************
*  "Overlapping Community Structures and Their Detection on Social Networks"   * 
*  Nam P. Nguyen, Thang N. Dinh, Dung T. Nguyen and My T. Thai                 *
*  The 3rd IEEE Int. Conf. on Social Computing (SOCIALCOM) 2011                *
*  http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6113092             *
*  Email: nanguyen@cise.ufl.edu                                                *
********************************************************************************
Please cite the above paper if you happened to use this code in your research. Thank you!

-------------
 Instruction 
-------------
1. Input file
The input file should be an edge list of the form "u v" where u and v are node IDs starting from 1. This edge list is of an undirected unweighted graph, so you don't need "v u" when "u v" was already in the list. 
PLEASE MAKE SURE THAT THE INPUT FILE DOES NOT CONTAIN DUPLICATE EDGES.
For example, the content of "input.txt", a sample input file can contain
1 3
5 2
8 7
8 9
7 2
.....

2. Output file
The output file will be named with the extension _DOCA_XX where XX is the overlapping threshold and will contain multiple rows, each of which composes a community of the input graph. For example, the content of "output_DOCA_75.txt", a sample output file can contain
1 3 5 27 8 32 91
2 4 5 23 44
4 6 8 10 23 45
.....
This means nodes 1, 3, 5, 27, 8, 32, 91 are in community #1; nodes 2, 4, 5, 23, 44 are in community #2, etc.

3. Command line
--
PLEASE MAKE SURE THAT YOU BUILD UP THE CODE AND HAVE THE EXECUTABLE FILE UPDATED BEFORE ANY EXPERIEMNT.
--
The command line should be in the form "doca.exe <input_file> <overlapping_threshold>", where <input_file> is the input edge list file, and <overlapping_threshold> is in (0, 2).
For example:
	doca.exe arxiv.txt 0.70
	doca.exe facebook.txt 0.85
Executable file "doca.exe" can be found in the either the "Release" folder (if you build it in the Release mode) or "Debug" folder (if you build it in the Debug mode). It is recommended that you use Release mode for a better time performance.

------
 Note
------
0. The code is implemented on Visual C++ Express, version 2010. We thank Microsoft for providing this software.
1. This release is limited to graphs with N=63732 nodes. If you want to analyze graphs with more than 63732 nodes and if memory allows, please change the constant MAX_N in file "prototypes.h" to the one you desired.
2. Of course, the authors expect this release is clear of bugs. However, if you find any program bugs or memory issues, please contact the corresponding author via "nanguyen@cise.ufl.edu". We are highly appreciate your cooperation.
3. Five real social traces that we used in the above paper are provided for your convenience in the "socialTraces" folder.
