Due to the limited storage capacity and updating capacity of each relay, different cache designs will lead to different average freshness of all updating files at users. Each user can get the fresh files from any relay as long as the relay has stored the fresh versions of the requested files. Each relay can download the fresh files from the server in a certain period of time. The server keeps the freshest versions of the files which are updated with fixed rates. In this paper, we investigate a cache updating system with a server containing $N$ files, $K$ relays and $M$ users. Finally, we consider a system where multiple users are connected to a single cache and find update rates for the cache and the users to maximize the total freshness over all users. for the user) is a $threshold$ $policy$, where the optimal update rates for rapidly changing files at the source may be equal to zero. for the cache), the optimal rate allocation policy for the cache (resp. We observe that for a given set of update rates for the user (resp. We provide an alternating maximization based method to find the update rates for the cache(s), $c_i$, and for the user, $u_i$, to maximize the freshness of the files at the user. Next, we generalize our setting to the case where there are multiple caches in between the source and the user, and find the average freshness at the user. We find an analytical expression for the average freshness of the files at the user. When the user gets an update, it either gets a fresh update from the cache or the file at the cache becomes outdated by a file update at the source. The user gets updates from the cache with rates $u_i$. The cache downloads and keeps the freshest version of the files from the source with rates $c_i$. The source keeps the freshest version of the files which are updated with known rates $\lambda_i$. We consider a cache updating system with a source, a cache and a user. More generally, our paper addresses the problem of optimal operation of a resource-constrained sampler that wishes to track multiple independent counting processes in a way that is as up to date as possible. That is, more prolific researchers should be updated more often, but there are diminishing returns due to the concavity of the square root function. We show that, in order to minimize this difference metric, the updater should allocate its total update capacity to researchers proportional to the $square$ $roots$ of their mean citation rates. We use a metric similar to the age of information: the long-term average difference between the actual citation numbers and the citation numbers according to the latest updates. In particular, it is subject to a total update rate constraint that it needs to distribute among individual researchers. The updater is resource-constrained and cannot update citations of all researchers all the time. We consider a resource-constrained updater, such as Google Scholar, which wishes to update the citation records of a group of researchers, who have different mean citation rates (and optionally, different importance coefficients), in such a way to keep the overall citation index as up to date as possible. We explore the relative performance of PoliteBinaryLambdaCrawl and other methods for handling politeness constraints on a dataset collected by crawling over 18.5M URLs daily over 14 weeks. We also propose an approximation for it, stating its theoretical optimality conditions and in the process discovering a connection to an approach previously thought of as a mere heuristic for freshness crawl scheduling. In this paper, we introduce PoliteBinaryLambdaCrawl, the first optimal algorithm for freshness crawl scheduling in the presence of politeness constraints as well as non-uniform page importance scores and the crawler's own crawl request limit. Determining how often to recrawl pages requires making tradeoffs based on the pages' relative importance and change rates, subject to multiple resource constraints - the limited daily budget of crawl requests on the search engine's end and politeness constraints restricting the rate at which pages can be requested from a given host. As the Web is becoming increasingly more dynamic, in addition to discovering new web pages a crawler needs to keep revisiting those already in the search engine's index, in order to keep the index fresh by picking up the pages' changed content. A Web crawler is an essential part of a search engine that procures information subsequently served by the search engine to its users.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |