Skip to main content

Data Mining

Data mining is the discovery of useful patterns in data. Data mining are used for prediction analysis and classification- e.g; what is the likelihood that a customer will migrate to a competitor.
OLAP, online analytical processing, is used to analyse historical data and its lies the business information required. OLAP are often used by marketing managers. Slice of data that are useful for marketing managers can be- How many customers between the ages 24 -25, that live in New York state, buy over $2,000 worth of groceries a month?
Reporting tools are used to provide reports on the data. That are displayed to show relevance to the business and keep track of key performance indicators.
Data visualization tools is used to display data from the data repository. Often data visualization is combined with data mining and OLAP tools. Data visualization can allow the user to manipulate that are to show relevancy and patterns.

Clustering:
Intuitively, clustering was the problem of finding clusters of points in the given data. The problem of clustering can formalized from distance Metrics in several ways. One way is to phrase it as the problem of grouping points into K (for a given K) sets so that the average distance of point from the centroid of their assigned cluster is minimized. Another way is to group point, So that the average distance between every pair of points in each cluster is minimized.
Another type of clustering appears in classification systems in biology. For instance leopards and humans are clustered under the class mammalia, wild crocodiles and snakes are clustered under reptilia.The clustering of mammalia has further clusters, such as carnivora and primates. We thus have hierarchical clustering. Given characteristics of different species biology have created a complex hierarchical clustering scheme grouping related species together at different levels of hierarchy.
The statistics community has studied clustering extensively. Database research has provided scalable clustering algorithm that can cluster very large datasets. The Birch clustering algorithm is one such algorithm. Intuitively that the points are inserted into a multidimensional tree structure and guided to appropriate leaf nodes on the basis of nearness to representative points in the internal nodes of the tree. Nearby points are there is clustered together.
An interesting application of clustering is to predict what new movies a person likely to be interested in, on the basis of:

1. The persons passed preferences in movies.
2. Other people with similar past preferences
3. The preferences of such people full new movies.

To find people with similar past preferences we create cluster people base only preferences for movies. The accuracy of clustering can be improved by previously clustering movies by the similarity, so even if people have not seen the same movies, if they have seen similar movies they would be clustered together. We can repeat the clustering, alternately clustering people, then movies, then people and so on till we reach and equilibrium.

Comments

  1. Listen...

    What I'm going to tell you might sound pretty creepy, maybe even kind of "strange"

    WHAT if you could just click "PLAY" to LISTEN to a short, "musical tone"...

    And magically attract MORE MONEY into your life?

    What I'm talking about is hundreds... even thousands of dollars!

    Think it's too EASY? Think it's IMPOSSIBLE?!?

    Well then, I'll be the one to tell you the news...

    Usually the most magical blessings in life are the easiest to RECEIVE!

    In fact, I will provide you with PROOF by letting you listen to a real-life "miracle money-magnet tone" I've produced...

    YOU just press "PLAY" and watch as your abundance angels fly into your life... it starts right away...

    CLICK here NOW to PLAY the mysterious "Miracle Money-Magnet Sound Frequency" as my gift to you!

    ReplyDelete

Post a Comment

If you find something wrong about this post please let us know. No Abusive Messages please.

Popular posts from this blog

Hub, repeater, switch, router, gateway, bridge

HUB Hub is a controller that controls the traffic on the network.  The following important properties of hub are:  1) It amplify signals. 2) It propagates signals through the network. 3) It does not require filtering. 4) It does not require path determination for switching. 5) It is used as network concentration points. Hubs are basically two types: 1) Active hub 2) Passive hub Active hub: A ctive hub works as repeater which is a hardware device that regenerates the received bit pattern before sending them out . Passive hub : A passive hub is a simple hardware device which provide a simple physical connection between the attached devices. Advantages of hub: It cannot filter the traffic full stop feeling generally refers to a process or device that screens network traffic for certain characteristics such as source address and destination address and protocol. Disadvantages of hub: On a hub, more than one user may try to send data on the netwo...

Scheduling: Non-Preemptive Scheduling

Scheduling : In multi-programmed computer, multiple processes competing for the CPU at the same time. This situation occurs whenever two or more processes are simultaneously in the ready state. If only one CPU is available. Then we need a system that decide which process run first and then next and this will be done by the scheduler. Scheduler : scheduler is an operating system module that she loves an axe top to be admitted into the system and then the next process to run. Scheduling is of two type: 1) Pre-emptive 2) Non pre-emptive Non Pre-emptive Scheduling : In batch non Pre-emptive scheduling implies that, once scheduled, selected job runs to completion. In other words, the running process not forced to relinquish ownership of the processor when a higher priority process becomes ready for execution. The scheduling techniques which use non preemptive scheduling are: 1) first come first serve (FCFS) scheduling 2) shortest job next (SJN) scheduling 3) dea...

Data Warehousing

  Data Warehouse is open to an almost limitless range of definitions. Simply put, data warehouses store and aggregation of a company's data. Data warehouses are an important asset for organisations to maintain efficiency, profitability and competitive advantages, organisations collect data through many sources- online, call centre, sales needs, inventory management. The data collected have degrees values and business relevance. Figure shown below shows the architecture of a typical data warehouse and illustrate the gathering of data, the storage of data, and the quaring and data analysis support. Different steps involved in getting data into a warehouse are called as extract, transform and lode or ELT tasks; extraction refers to getting data from the sources, while loaders reference to loading the data into data warehouse. Characteristics of data warehouse: Multidimensional conceptual view Generic dimensionality Unlimited dimensions and aggregation le...