Showing posts with label Projects. Show all posts


. PAM: An Efficient and Privacy-Aware
Monitoring Framework for Continuously Moving Objects

Abstract
Efficiency and privacy are two fundamental issues in moving object monitoring. This paper proposes a privacy-aware monitoring (PAM) framework that addresses both issues. The framework distinguishes itself from the existing work by being the first to holistically address the issues of location updating in terms of monitoring accuracy, efficiency, and privacy, particularly, when and how mobile clients should send location updates to the server. Based on the notions of safe region and most probable result, PAM performs location updates only when they would likely alter the query results. Furthermore, by designing various client update strategies, the framework is flexible and able to optimize accuracy, privacy, or efficiency. We develop efficient query evaluation/reevaluation and safe region computation algorithms in the framework. The experimental results show that PAM substantially outperforms traditional schemes in terms of monitoring accuracy, CPU cost, and scalability while achieving close-to-optimal communication cost.
Existing System:
The accuracy is low since the query results are correct only at the time instances of periodic updates, but not in between them or at any time of deviation updates.

The updates are performed regardless of the existence of Queries a high update frequency may improve the monitoring accuracy, but is at the cost of unnecessary updates and query reevaluation.

The privacy issue is simply ignored by assuming that the clients are always willing to provide their exact positions to the server.
Proposed System:
In our approach to maintain safe region we have object index, Query index, the query processor and location manager.As for efficiency, the framework significantly reduces location updates to only when an object is moving out of the safe region, and thus, is very likely to alter the query results.
The safe region is computed based on the queries in such a way that the current results  of all queries remain valid as long as all objects reside inside their respective    safe regions.
Modules:
Ø  Privacy-Aware Location Model.
Ø  Object Index
Ø  Query Index
Ø  Query Processor and Location Manager.
Ø  Spatial Relations






            If you want to more information available here  DOWNLOAD
. Delay Analysis and Optimality of Scheduling Policies for Multi-Hop Wireless Networks
 Abstract
We analyze the delay performance of a multi-hop wireless network in which the routes between source-destination pairs are fixed. We develop a new queue grouping technique to handle the complex correlations of the service process resulting from the multi-hop nature of the flows and their mutual sharing of the wireless medium. A general set based interference model is assumed that imposes constraints on links that can be served simultaneously at any given time. These interference constraints are used to obtain a fundamental lower bound on the delay performance of any scheduling policy for the system. We present a systematic methodology to derive such lower bounds. For a special wireless system, namely the clique, we design a policy that is sample path delay optimal. For the tandem queue network, where the delay optimal policy is known, the expected delay of the optimal policy numerically coincides with the lower bound. The lower bound analysis provides useful insights into the design and analysis of optimal or nearly optimal scheduling policies.
Existing System
 A large number of studies on multi-hop wireless networks have been devoted to system stability while maximizing metrics like throughput or utility. These metrics measure the performance of a system over a long time-scale. For a large class of applications such as video or voice over IP, embedded network control and for system design; metrics like delay are of prime importance. The delay performance of wireless networks, however, has largely been an open problem. This problem is notoriously difficult even in the context of wireline networks, primarily because of the complex interactions in the network (e.g., superposition, routing, departure, etc.) that make its analysis amenable only in very special cases like the product form networks. The problem is further exacerbated by the mutual interference inherent in wireless networks which, complicates both the scheduling mechanisms and their analysis. Some novel analytical techniques to compute useful lower bound and delay estimates for wireless networks with singlehop traffic were developed in. However, the analysis is not directly applicable to multi-hop wireless network with multihop flows, due to the difficulty in characterizing the departure process at intermediate links. The metric of interest in this paper is the system-wide average delay of a packet from the source to its corresponding destination. We present a new, systematic methodology to obtain a fundamental lower bound on the average packet delay in the system under any scheduling policy. Furthermore, we re-engineer well known scheduling policies to achieve good delay performance viz-a-viz the lower bound.
Proposed System
We analyze a multi-hop wireless network with multiple source-destination pairs, given routing and traffic information. Each source injects packets in the network, which traverses through the network until it reaches the destination. For example, a multi-hop wireless network with three flows is shown in Fig. 1. The exogenous arrival processes AI (t), AII (t) and AIII (t) correspond to the number of packets injected in the system at time t. A packet is queued at each node in its path where it waits for an opportunity to be transmitted. Since the transmission medium is shared, concurrent transmissions can interfere with each others’ transmissions. The set of links that do not cause interference with each other can be scheduled simultaneously, and we call them activation vectors (matchings). We do not impose any a priori restriction on the set of allowed activation vectors, i.e., they can characterize any combinatorial interference model. For example, in a K-hop interference model, the links scheduled simultaneously are separated by at least K hops. In the example show in Fig. 1, each link has unit capacity; i.e., at most one packet can be transmitted in a slot. For the above example, we assume a 1-hop interference model.

 If you want to more information available here  DOWNLOAD
 

. Ranking Spatial Data by Quality Preferences
Abstract:
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters.
Existing System:
To our knowledge, there is no existing efficient solution for processing the top-k spatial preference query.Object ranking is a popular retrieval task in various applications. In relational databases, we rank tuples using an aggregate score function on their attribute values. For example, a real estate agency maintains a database that contains information of flats available for rent. A potential customer wishes to view the top-10 flats with the largest sizes and lowest prices. In this case, the score of each flat is expressed by the sum of two qualities: size and price, after normalization to the domain (e.g., 1 means the largest size and the lowest price). In spatial databases, ranking is often associated to nearest neighbor (NN) retrieval. Given a query location, we are interested in retrieving the set of nearest objects to it that qualify a condition (e.g., restaurants). Assuming that the set of interesting objects is indexed by an R-tree , we can apply distance bounds and traverse the index in a branch-and-bound fashion to obtain the answer.
Proposed System:
We Propose (i) spatial ranking, which orders the objects according to their distance from a reference point, and (ii) non-spatial ranking, which orders the objects by an aggregate function on their non-spatial values. Our top- k spatial preference query integrates these two types of ranking in an intuitive way. As indicated by our examples, this new query has a wide range of applications in service recommendation and decision support systems. To our knowledge, there is no existing efficient solution for processing the top-k spatial preference query. A brute-force approach (to be elaborated in Section 3.2) for evaluating it is to compute the scores of all objects in D and select the top-k ones. This method, however, is expected to be very expensive for large input datasets.
Module Description:
  1. Spatial Ranking
  2. Non-Spatial ranking
  3. Neighbor (NN) Retrieval
  4. Spatial Query Evaluation on R-trees

Spatial Ranking
spatial ranking, which orders the objects according to their distance from a reference point.
Non-Spatial Ranking:
Non-spatial ranking, which orders the objects by an aggregate function on their non-spatial values. Our top- k spatial preference query integrates these two types of ranking in an intuitive way. As indicated by our examples, this new query has a wide range of applications in service recommendation and decision support systems. To our knowledge, there is no existing efficient solution for processing the top-k spatial preference query.

If you want to more information available here  DOWNLOAD 

Extended XML Tree Pattern Matching: Theories and Algorithms
ABSTRACT
                   As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g. XPath, XQuery) define more axes and functions such as negation function, order-based axis and wildcards.Here we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards and order restriction. We establish a theoretical framework about “matching cross” which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms.
Existing System
         Previous algorithms focus on XML tree pattern queries with only P-C and A-D relationships. Little work has been done on XML tree queries which may contain wildcards, negation function and order restriction, all of which are frequently used in XML query languages such as XPath and XQuery. In this article, we call an XML tree pattern with negation function, wildcards and/or order restriction as extended XML tree pattern. Previous XML tree pattern matching algorithms do not fully exploit the “optimality” of holistic algorithms.
Proposed System
                We build a theoretical framework on optimal processing of XML tree pattern queries. We show that “matching cross” is the key reason to result in the sub-optimality of holistic algorithms. Intuitively, matching cross describes a dilemma where holistic algorithms have to decide whether to output useless intermediate result or to miss useful results. The fact that TwigStack is optimal for queries with only A-D relationships can be explained that no matching cross can be found for any XML document with respect to queries with A-D edges. We classify matching cross to bound and unbounded matching cross (BMC and UMC). We develop theorems to show that only part of UMC (i.e. UMC with mediator) can force holistic algorithms to potentially output useless intermediate results.  Based on the theoretical analysis, we develop a series of holistic algorithms TreeMatch to achieve a large optimal query class for Q/,//,*. Our main technique is to use a concise encoding to present matching results, which leads to the reduction of useless intermediate results. We conducted an extensive set of experiment on synthetic and real data set for performance comparison. We compared TreeMatch with previous four holistic XML tree pattern matching algorithms. The experimental results show that our algorithm can correctly process extended XML tree patterns, achieving performance speedup for tested queries and data sets, even in their restricted focus. The improvement mainly owes to the reduction of the size of intermediate results.
Modules:
1. Optimality of holistic algorithm:
                     Previous XML tree pattern matching algorithms do not fully exploit the “optimality” of holistic algorithms. TwigStack guarantees that there is no useless intermediate result for queries with only Ancestor-Descendant (A-D) relationships. Therefore, TwigStack is optimal for queries with only A-D edges. Another algorithm TwigStackList  enlarges the optimal query class of TwigStack by including Parent-Child(P-C) relationships in non-branching edges. A natural question is whether the optimal query class of TwigStackList can be further improved. Hence, the current open problems include (1) how to identify a larger query class which can be processed optimally and (2) how to efficiently answer a query which cannot be guaranteed to process optimally.  This explores the challenges and shows the promise of a novel  theoretical framework called “matching cross” to identify a large optimal query class for posing extended XML tree queries.

If you want to more information available here  DOWNLOAD 

Effective Navigation of Query Results Based on Concept Hierarchies
Abstract:-

Search queries on biomedical databases, such as PubMed, often return a large number of results, only a small subset of which is relevant to the user. Ranking and categorization, which can also be combined, have been proposed to alleviate this information overload problem. Results categorization for biomedical databases is the focus of this work. A natural way to organize biomedical citations is according to their MeSH annotations. MeSH is a comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav system, a novel search interface that enables the user to navigate large number of query results by organizing them using the MeSH concept hierarchy. First, the query results are organized into a navigation tree. At each node expansion step, BioNav reveals only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modeling. We show that the problem of selecting the best concepts to reveal at each node expansion is NP-complete and propose an efficient heuristic as well as a feasible optimal algorithm for relatively small trees. We show experimentally that BioNav outperforms state-of-the-art categorization systems with respect to the user navigation cost. We have implemented BioNav for the MEDLINE database at http://db.cse.buffalo.edu/bionav.
Existing System
Existing search operation Information overload is a major problem when searching
Biomedical databases such as PubMed, where typically a large number of citations are returned, of which only a small subset is relevant to the user.

Proposed System
The proposals  dynamically categorize SQL query results by inferring a hierarchy based on the characteristics of the result tuples. Their domain is the tuple attributes and their problem is how to organize them hierarchically in order to minimize the navigation cost. They also decide the value ranges for each attribute, for both categorical and numerical ones, and how to rank them. One of the systems  takes into consideration the user’s preferences during the inference for a more personalized experience. Once the hierarchy is inferred, they follow a static navigation method. BioNav is distinct since it offers dynamic navigation on a predefined hierarchy, as is the MeSH concept hierarchy. Hence, BioNav is complementary to these systems, since it can be used to optimize the navigation, after these systems construct the navigation tree.

If you want to more information available here >DOWNLOAD 

02 . Discovering Conditional Functional Dependencies
Abstract:
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from sample relations. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed itemsets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. The other two algorithms are developed for discovering general CFDs. The first algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depthfirst approach used in FastFD, a method for discovering FDs. It leverages closed-itemset mining to reduce search space. Our experimental results demonstrate the following. (a) CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. (b) CTANE works well when a given sample relation is large, but it does not scale well with the arity of the relation. (c) FastCFD is far more efficient than CTANE when the arity of the relation is large.
Algorithm Used:
             Levelwise Algorithm
Existing System:
As remarked earlier, constant CFDs are particularly important for object identification, and thus deserve a separate treatment. One wants efficient methods to discover constant CFDs alone, without paying the price of discovering all CFDs. Indeed, as will be seen later, constant CFD discovery is often several orders of magnitude faster than general CFD discovery. Levelwise algorithms may not perform well on sample relations of large arity, given their inherent exponential complexity. More effective methods have to be in place to deal with datasets with a large arity. A host of techniques have been developed for (non-redundant) association rule mining, and it is only natural to capitalize on these for CFD discovery. As we shall see, these techniques can not only be readily used in constant CFD discovery, but also significantly speed up general CFD discovery. To our knowledge, no previous work has considered these issues for CFD discovery.
Proposed System:

In light of these considerations we provide three algorithms for CFD discovery: one for discovering constant CFDs, and the other two for general CFDs.

(Module: 1) We propose a notion of minimal CFDs based on both the minimality of attributes and the minimality of patterns. Intuitively, minimal CFDs contain neither redundant attributes nor redundant patterns. Furthermore, we consider frequent CFDs that hold on a sample dataset r, namely, CFDs in which the pattern tuples have a support in r above a certain threshold. Frequent CFDs allow us to accommodate unreliable data with errors and noise. Our algorithms find minimal and frequent CFDs to help users identify quality cleaning rules from a possibly large set of CFDs that hold on the samples.

(Module: 2) Our first algorithm, referred to as CFDMiner, is for constant CFD discovery. We explore the connection between minimal constant CFDs and closed and free patterns. Based on this, CFDMiner finds constant CFDs by leveraging a latest mining technique proposed in [24], which mines closed itemsets and
free itemsets in parallel following a depth-first search scheme.

(Module: 3) Our second algorithm, referred to as CTANE, extends TANE to discover general CFDs. It is based on an attribute-set/pattern tuple lattice, and mines CFDs at level k + 1 of the lattice (i.e., when each set at the level consists of k+1 attributes) with pruning based on those at level k. CTANE discovers minimal CFDs only.

(Module: 4) Our third algorithm, referred to as FastCFD, discovers general CFDs by employing a depth-first search strategy instead of the levelwise approach. It is a nontrivial extension of FastFD mentioned above, by mining pattern tuples. A novel pruning technique is introduced by FastCFD, by leveraging constant CFDs found by CFDMiner. As opposed to CTANE, FastCFD does not take exponential time in the arity of sample data when a canonical cover of CFDs is not exponentially large.


If you want to more information available here         DOWNLOAD

01 . Decision Trees for Uncertain Data
ABSTRACT:
              Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the “complete information” of a data item (taking into account the probability density function (pdf)) is utilized.
           We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted that show that the resulting classifiers are more accurate than those using value averages. Since processing pdf’s is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.
EXISTING SYSTEM :
            In traditional decision-tree classification, a feature (an attribute) of a tuple is either categorical or numerical. For the latter, a precise and definite point value is usually assumed. In many applications, however, data uncertainty is common. The value of a feature/attribute is thus best captured not by a single point value, but by a range of values giving rise to a probability distribution. Although the previous techniques can improve the efficiency of means, they do not consider the spatial relationship among cluster representatives, nor make use of the proximity between groups of uncertain objects to perform pruning in batch. A simple way to handle data uncertainty is to abstract probability distributions by summary statistics such as means and variances. We call this approach Averaging. Another approach is to consider the complete information carried by the probability distributions to build a decision tree. We call this approach Distribution-based.
PROPOSED SYSTEM :
           We study the problem of constructing decision tree classifiers on data with uncertain numerical attributes. Our goals are (1) to devise an algorithm for building decision trees from uncertain data using the Distribution-based approach; (2) to investigate whether the Distribution-based approach could lead to a higher classification accuracy compared with the Averaging approach; and (3) to establish a theoretical foundation on which pruning techniques are derived that can significantly improve the computational efficiency of the Distribution-based algorithms.

If you want to more information click here      DOWNLOAD
Abstract
            Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods.

Index TermsFuzzy similarity, feature clustering, feature extraction, feature reduction, text classification.
           
Existing System:
            The first feature extraction method based on feature clustering was proposed by Baker and McCallum, which was derived from the “distributional clustering” idea of Pereira et al. Al-Mubaid and Umair used distributional clustering to generate an efficient representation of documents and applied a learning logic approach for training text classifiers. The Agglomerative Information Bottleneck approach was proposed by Tishby et al. The divisive information-theoretic feature clustering algorithm was proposed by Dhillon et al, which is an information-theoretic feature clustering approach, and is more effective than other feature clustering methods. In these feature clustering methods, each new feature is generated by combining a subset of the original words. However, difficulties are associated with these methods. A word is exactly assigned to a subset, i.e., hard-clustering, based on the similarity magnitudes between the word and the existing subsets, even if the differences among these magnitudes are small. Also, the mean and the variance of a cluster are not considered when similarity with respect to the cluster is computed. Furthermore, these methods require the number of new features be specified in advance by the user.

                      Full Project is Available           Download
Welcome to Vlxi Tech

Popular Posts

vlxitech@2013 all rights are reserved. Powered by Blogger.

Copyright © Vlxi Tech - -

back to top