Research Papers by Subject Areas

Network Optimization and Stability (on Overlay Content Distribution or Wireless Networks)

We formulate and solve an optimization problem related to swarming-type of content distribution or file sharing, where peers exchange file chunks. We applied an algorithm of gradient projection on simplices, by Bertsekas, etc. This is a very interesting problem and this paper is only the first step. At the core of the problem is optimal rate allocation on all possible multicast trees, a counterpart of the unicast rate-allocation problem where all possible paths are allowed. The algorithm turns out to be that, in each iterative step, selecting the least-cost tree and shift traffic from higher-cost trees to the least-cost tree.

This paper is about an optimization problem for joint server-selection and bandwidth-allocation to minimize the worst-case link congestion. We applied an algorithm of gradient projection on simplices, by Bertsekas, etc, and proved convergence results for the basic algorithm and the distributed version of it.

The problem involves time-sharing of permissible link schedules when wireless links interfere with each other. Problems like this also show up in multipath routing, particularly with multiple multicast trees. They cause certain technical difficulties to the subgradient algorithm, which is often used in network optimization. For instance, the subgradient solution uses one of the schedules each time and the time share variable does not converge in the normal sense. (We proved that it converges in the time average sense in the multipath routing setting in a current piece of work. See later.). And the convergence of the rate variable can be slow. Here, we combined a two-timescale algorithm with column generation. Such a problem also has a hard combinatorial subproblem, such as finding the maximum weighted independent set. We show how approximation algorithms to the subproblem translates into an approximation ratio to the overall utility-optimization problem.

This is related and extend the previous two papers. The problem here is very similar to the wireless link scheduling problem. The problem is how to form bigger file-distribution/exchange sessions from multiple smaller sessions for different files to achieve the fastest distribution speed, or equivalently, lowest worst-case link congestion. This is known as universal swarming. We proved that the time-share variable (for time-sharing of the optimal multicast trees) converges in the time average sense in the subgradient algorithm. The solution involves a hard subproblem of finding a minimum-cost Steiner tree (the counterpart of shortest path in the unicast case). We show how the approximation algorithm to the Steiner tree subproblem translates to the approximation ratio to the original objective function. We also integrate a column-generation approach into the subgradient algorithm to reduce the number of times the tree-selection subproblem is solved.

The paper studies the queueing/stability issue of our algorithm when the rate is feasible. The dual prices, which reflect the congestion degree at each link, are used to as weights to select the min-cost trees, or approximately min-cost trees. The algorithm is a time-sharing algorithm on these selected trees. The dual price update can be viewed as updating a virtual queue, which is shown to be stable. We relate the stability of the virtual queues (dual prices) to that of the real packet queues. This is generally difficult for multi-hop traffic. The fact that we have multicast traffic makes it even harder. To do this, we use network signaling to communicate the source rate information quickly to the links for the links to update the dual costs. We also regulate the source traffic rate. Then, under some priority-based scheduling, the real queues can be shown to be stable. We also considered a second approach of randomly selecting the multicast distribution trees.

 

Queueing and Other Probabilistic Models on Networks

In this paper, we analyze a model where packet disorder is caused by multi-path routing. Packets are generated according to a Poisson process. Then, they arrive at a disordering network modeled by two parallel M/M/1 queues, and are routed to each of the queues according to an independent Bernoulli process. A resequencing buffer follows the disordering network. In such a model, the packet resequencing delay is known. However, the size of the resequencing queue is unknown. We derive the probability for the large deviations of the queue size.

In this paper, we model packet disordering by adding an IID random propagation delay to each packet and derive simple expressions for the required buffer size and the resequencing delay. We demonstrate that these two quantities can be significant and show that the resequencing problem becomes worse as the link speed increases.

This is a draft, which intends to extend the Infocom 2004 paper. Here, we allow the arrival process to be more general than the Poisson process, such as the renewal process or Markov process, provided that the process satisfies the large deviation principle. The service process can be similarly generalized. The disordering network consists of two parallel queues. We derived the large-deviation type formula for the resequencing buffer size, which involves the moment generating functions. I believe the resulting formula is correct because it works for the case when the disordering network is two parallel M/M/1 queues. We assumed nice conditions for now and showed half of the inequality. We haven’t been able to show the other half of the inequality. The details are technical.

This paper studies the feasibility and algorithms for inferring the delay at each link in a communication network based on a large number of end-to-end measurements. The restriction is that we are not allowed to measure directly on each link and can only observe the route delays. It is assumed that we have considerable flexibility in choosing which routes to measure. We investigate two different cases: 1) each link delay is a constant and 2) each link delay is modeled as a random variable from a family of distributions with unknown parameters. We answer whether such indirect inference is possible at all, and when possible, how it can be carried out. The emphasis is on developing the maximum-likelihood estimators for scenario 2) when the link delays are modeled by exponential random variables or mixtures of exponentials. We have derived solutions based on the EM algorithm and demonstrated that, even though they do not necessarily reflect the true model parameters, they do seem to maximize the likelihood in most cases and that the resulting probability density functions match the true functions on regions where the probability mass concentrates.

The first half is basic probability calculation. The second half is more interesting. The results are quite clean and nice. In this paper, we analyze the statistical properties of a randomized binary search algorithm and its variants. The algorithms are motivated by how to conduct distributed search for a file in a peer-to-peer or content distribution network. The basic discrete version of the problem is as follows. Suppose there are total m items, numbered 1, 2, ..., m, out of which the first k items are marked, where k is unknown. The objective is to choose one of the marked items uniformly at random. In each step of the basic algorithm, a number y is chosen uniformly at random from 1 to x, where x is the number chosen in the previous step and is equal to m in the first step. A distributed query is made about y. If the server responsible for y replies that y is marked, the algorithm returns. We also consider batch versions of this algorithm in which multiple numbers are chosen in each step and multiple queries are made in parallel. We give the mean and variance (exact or asymptotic) for the number of search steps in each version of the algorithm, and when possible, we give its distribution. We also analyze the access or hit pattern to the entire search space.

 

(Discrete) Algorithms for Peer-to-Peer/Overlay/Content Distribution Networks

This paper deals with a discrete server-selection problem on the hypercube. We discovered a very simple but non-obvious algorithm for a rather unconventional optimization objective. We can show its optimality.

 

Miscellaneous Mathematical Results

I derived some basic inequalities, which can be viewed as a refinement of the Jensen’s inequality for the class of functions, or can be viewed as a generalization of a refined arithmetic mean-geometric mean inequality introduced by Cartwright and Field in 1978. The strength of the result is in bringing the variations of the data samples into consideration.

I give two refinements to the Chernoff bound for the sum of non-identical Bernoulli random variables with parameters pi, where 0 < pi < 1 and i = 1, ..., n. Traditionally, the Chernoff bound is a function of the arithmetic mean of the pi’s, . The refined bounds contain the term , and hence, are able to capture the variations of pi’s.

 

Sensor Network Energy-Efficient Routing and Control

 

Scheduling Bulk Data Transfer on Research Networks

 

Chip Multiprocessor (CMP) Cache Algorithm and Performance

 

Algorithms and Performance of Network Switches

 

Others