Data mining is the process of non-trivial discovery from implied,
previously unknown, and potentially useful information from data in
large databases. Hence it is a core element in knowledge discovery,
often used synonymously. The data is integrated and cleaned so that the
relevant data is taken. Data mining presents discovered data that is
not just clear to data mining analysts but also for domain experts who
may use it to derive actionable recommendations. Successful
applications of data mining include the analysis of genetic patterns,
graph mining in finance, and consumer behavior in marketing.
The Institute of Information Systems has developed and researched a
wide spectrum of data mining applications with a focus on web
applications in education, B2C retail applications, and knowledge
management. One focus is on the analysis of the web as "the world's
largest database." In particular we analyze and develop methods and
tools for exploratory analysis of behavioral data. Another area of
interest is the transition of from temporal data analysis (still plays
an important role) that implicitly assumes a stationary role for the
described domains, to the analysis of the dynamic aspect of such data
(these data, as a rule, are too complicated to examine using standard
time series analysis techniques).
Web mining describes the application of traditional data mining
techniques onto the web resources and has facilitated the further
development of these techniques to consider the specific structures of
web data. The analyzed web resources contain (1) the actual web site
(2) the hyperlinks connecting these sites and (3) the path that online
users take on the web to reach a particular site. Web usage mining then
refers to the derivation of useful knowledge from these data inputs.
The content of the raw data for web usage mining on the one hand, and
the expected knowledge to be derived from it on the other, pose a
special challenge. While the input data are mostly web server logs and
other primarily technically oriented data, the desired output is an
understanding of user behavior in the domain of online information
search, online shopping, online learning etc. This requires on the one
hand an understanding and formal modeling of the behavior examined in
the domain and on the other a picture of how the input data figures in
these models. We are investigating "semantic web" approaches as a
promising avenue for the formal and computational aspects of this goal.
The contents aspects of this goal require an understanding of
behavioral theories in the investigated domains and a highly
interdisciplinary research approach. The eventual presentation of the
mining results for domain experts should consider general aspects of
user interface design as well as domain-specific customs. Further, the
development of visualizations as an important design element of user
oriented mining systems is in the focus of our research efforts.
User behavior and data availability tend to change over time.
Therefore the dynamism of a domain is an important question in every
mining analysis and in each presentation of mining results for domain
experts. Most data mining algorithms tend to treat the dataset being
analyzed as a static unit. However a dataset may change in terms of
content and/or structure over time, either due to updates or just
because the data was collected over a long period of time. Regarding
updates, it seems sufficient to update the patterns discovered
previously from the data. Most of the "incremental mining techniques"
proposed to solve this task are based on their static counterparts and
re-use information from earlier mining runs, to update patterns. The
data collection over a long time period creates another situation. In
this case the data experiences only one form of update: insertions of
data. The distribution of entities in the data set can change on
account of external and/or internal factors. Due to these changes, the
patterns over time may also change (pattern evolution). There are two
types of pattern change: changes in the essential make up of a pattern,
for example the relationship in the data as reflected by the specific
pattern, and changes in the statistical measurement of the pattern.
Both types of changes can have a significant influence on the decision
process and hence should be observed. The pattern supervision
necessitates a data model that contains a temporal component to
illustrate a specific pattern for the corresponding time. A second
question that automatically comes in to play is: which patterns should
be supervised or observed? The interesting thing is that even when
examining smaller data amounts, the number of discovered patters is
often very big. In these cases the analyst must chose a manageable
subset of the patterns. Our research focuses on formal descriptions of
pattern evolution and supervision, the efficient development of
algorithms for these tasks and the implementation of suitable
tools.
The area is closely related to knowledge management, data protection
and data security. In particular questions from knowledge management
are highly relevant because the web usually implies the access to
information and therefore the construction of knowledge. This raises a
number of E-privacy questions. Data collection and data analysis
practices are coming under increasing scrutiny from legislation and
technical proposals that aim at either minimizing recording or at
extending it.
Researchers in this
Area
Prof. Dr. Bettina Berendt
Prof. Oliver Günther, Ph.D.
Maximilian Teltzrow
Selected
Publications
Baron, S., Spiliopoulou, M., Günther, O.: Efficient Monitoring of
Patterns in Data Mining Environments. In Proc. Seventh East-European
Conference on Advance in Databases and Information Systems (ADBIS
2003), Dresden, Germany. Springer 2003
Berendt, B.: Using site semantics to analyze, visualize, and support
navigation. Data Mining and Knowledge Discovery, 6, 37-59, 2002
Berendt, B., Brenstein, E.: Visualizing Individual Differences in Web
Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns.
Behavior Research Methods, Instruments, & Computers, 33, 243-257,
2001
Berendt, B., Spiliopoulou, M.: Analyzing navigation behaviour in web
sites integrating multiple information systems. The VLDB Journal, 9,
56-75, 2000
Spiliopoulou, M., Pohle, C., Teltzrow, M.: Modelling Web Site Usage
with Sequences of Goal-Oriented Tasks, In Proc. Multikonferenz
Wirtschaftsinformatik, in: E-Commerce - Netze, Märkte, Technologien,
Physica-Verlag, Heidelberg, 2002.
|
|