Skip to main content

Data Mining

Data mining is the discovery of useful patterns in data. Data mining are used for prediction analysis and classification- e.g; what is the likelihood that a customer will migrate to a competitor.
OLAP, online analytical processing, is used to analyse historical data and its lies the business information required. OLAP are often used by marketing managers. Slice of data that are useful for marketing managers can be- How many customers between the ages 24 -25, that live in New York state, buy over $2,000 worth of groceries a month?
Reporting tools are used to provide reports on the data. That are displayed to show relevance to the business and keep track of key performance indicators.
Data visualization tools is used to display data from the data repository. Often data visualization is combined with data mining and OLAP tools. Data visualization can allow the user to manipulate that are to show relevancy and patterns.

Clustering:
Intuitively, clustering was the problem of finding clusters of points in the given data. The problem of clustering can formalized from distance Metrics in several ways. One way is to phrase it as the problem of grouping points into K (for a given K) sets so that the average distance of point from the centroid of their assigned cluster is minimized. Another way is to group point, So that the average distance between every pair of points in each cluster is minimized.
Another type of clustering appears in classification systems in biology. For instance leopards and humans are clustered under the class mammalia, wild crocodiles and snakes are clustered under reptilia.The clustering of mammalia has further clusters, such as carnivora and primates. We thus have hierarchical clustering. Given characteristics of different species biology have created a complex hierarchical clustering scheme grouping related species together at different levels of hierarchy.
The statistics community has studied clustering extensively. Database research has provided scalable clustering algorithm that can cluster very large datasets. The Birch clustering algorithm is one such algorithm. Intuitively that the points are inserted into a multidimensional tree structure and guided to appropriate leaf nodes on the basis of nearness to representative points in the internal nodes of the tree. Nearby points are there is clustered together.
An interesting application of clustering is to predict what new movies a person likely to be interested in, on the basis of:

1. The persons passed preferences in movies.
2. Other people with similar past preferences
3. The preferences of such people full new movies.

To find people with similar past preferences we create cluster people base only preferences for movies. The accuracy of clustering can be improved by previously clustering movies by the similarity, so even if people have not seen the same movies, if they have seen similar movies they would be clustered together. We can repeat the clustering, alternately clustering people, then movies, then people and so on till we reach and equilibrium.

Comments

  1. Listen...

    What I'm going to tell you might sound pretty creepy, maybe even kind of "strange"

    WHAT if you could just click "PLAY" to LISTEN to a short, "musical tone"...

    And magically attract MORE MONEY into your life?

    What I'm talking about is hundreds... even thousands of dollars!

    Think it's too EASY? Think it's IMPOSSIBLE?!?

    Well then, I'll be the one to tell you the news...

    Usually the most magical blessings in life are the easiest to RECEIVE!

    In fact, I will provide you with PROOF by letting you listen to a real-life "miracle money-magnet tone" I've produced...

    YOU just press "PLAY" and watch as your abundance angels fly into your life... it starts right away...

    CLICK here NOW to PLAY the mysterious "Miracle Money-Magnet Sound Frequency" as my gift to you!

    ReplyDelete

Post a Comment

If you find something wrong about this post please let us know. No Abusive Messages please.

Popular posts from this blog

Data Warehousing

  Data Warehouse is open to an almost limitless range of definitions. Simply put, data warehouses store and aggregation of a company's data. Data warehouses are an important asset for organisations to maintain efficiency, profitability and competitive advantages, organisations collect data through many sources- online, call centre, sales needs, inventory management. The data collected have degrees values and business relevance. Figure shown below shows the architecture of a typical data warehouse and illustrate the gathering of data, the storage of data, and the quaring and data analysis support. Different steps involved in getting data into a warehouse are called as extract, transform and lode or ELT tasks; extraction refers to getting data from the sources, while loaders reference to loading the data into data warehouse. Characteristics of data warehouse: Multidimensional conceptual view Generic dimensionality Unlimited dimensions and aggregation le...

DBMS: Normalization

Normalization : Normalization is the process of transformation of the conceptual schema of the database into a computer represent table form. Normalization is the process of removing the redundancies from incoming data.  Normalization is a technique to which helps the user to group the data and place the data in a table.  Normalization is a process which ensure the inconsistencies are not introduced into the database. Need of Normalization : we know with the time, most of databases grow time to time by adding new relations and relationships, the data may be used in different ways. Regularly the information may undergo series of updations in such situations, the performance of a database is entirely dependent upon its design.      A bad  database design  may lead to certain undesirable things: Repetition of information Inability to represent certain information  Loss of information Uses of Normalization: When data is large a...

Hub, repeater, switch, router, gateway, bridge

HUB Hub is a controller that controls the traffic on the network.  The following important properties of hub are:  1) It amplify signals. 2) It propagates signals through the network. 3) It does not require filtering. 4) It does not require path determination for switching. 5) It is used as network concentration points. Hubs are basically two types: 1) Active hub 2) Passive hub Active hub: A ctive hub works as repeater which is a hardware device that regenerates the received bit pattern before sending them out . Passive hub : A passive hub is a simple hardware device which provide a simple physical connection between the attached devices. Advantages of hub: It cannot filter the traffic full stop feeling generally refers to a process or device that screens network traffic for certain characteristics such as source address and destination address and protocol. Disadvantages of hub: On a hub, more than one user may try to send data on the netwo...