Centre for Data Science and Analytics (CDSA)

Led by Prof. Dr Lim Tong Ming

Location: Big Data Analytic Centre, Block SA, TAR UMT Main Campus

i. Objectives

This Centre for Data Science and Analytics (CDSA) aims to establish Tunku Abdul Rahman University of Management and Technology (TAR UMT) as the leading centre for big data research and development in Malaysia and in the Asia Pacific region. The centre aims to provide Big Data infrastructure and Big Data Analytics computing resources to collect, manage, filter and generate usable large data sets in order to provide insights to identify patterns, trends, perceptions, prediction and forecast to make good decisions for business and scientific activities.

ii. Vision Statement

To be the leading research centre in Big Data Analytics (BDA) research and development for academics and industry to generate higher productivity, accurate and reliable insights in Malaysia and the Asia Pacific region.

iii. Rationale and Research Plan

Rationale

 The rationale of setting up the centre is to support activities in the area of big data by collaborating with researchers from interdisciplinary area of studies. Initial collaboration will be with researchers from withtin TAR UMT faculties such as FAFB, FCCI etc. Simultaneously, the centre will also intensify academic-industry research and development activities. The centre will provide talents, guidance, and expertise in the technical field of big data to final year project undergraduate student. Meanwhile, the lab established by the centre will also provide R&D resources to postgradaute candidates to explore ground breaking research activities. These projects will be supervised by Professors and academics from the centre. The most important goal of the centre is to research and develop industry-driven systems for the better good to the nation and society. Furthermore, BDA research outcomes can be filed for commercial intellectual property (IP). Any income generated from commercial interests such as licensing and computer applications can be used to supplement the university college’s income and to provide a self-sustainable lab.  In return, researchers will earn their reputation and be rewarded from income generated.

Research Plan

The Big Data Analytics (BDA) Lab established by the Centre for Data Science and Analytics provides four (4) key components: Big Data Solicitor collects real time data streams, Big Data Hadoop (Multi-Node) Cluster, Modelling Servers and Big Data Visualization. BDA is capable to provide essential computing resources for Big Data related teaching and learning needs for undergraduate as well as postgraduate programmes. BDA is able to undertake industry driven research projects to design and develop proof-of-concept prototypes prior to production grade deployment. The centre is designed to be able to provide multi facets cross interdisciplinary research projects. They include projects such as Smart Campus, Agriculture 4.0, Industry 4.0, Education 4.0 and national level Big Data related activities.

The Centre is tasked to produce postgraduates that are skillful in the area of Big Data Analytics. A group of high profile researchers in several selected research directions will be take part in the Big Data related activities to produce solutions for the industry collaborators. In this effort, the Centre at TAR UMT will intensify the generation of Big Data Analytics expertise as a talent pool for the industry. At the same time, the Centre for Data Science and Analytics will strategically produce patents and commercialize Big Data related solutions and licenses to generate income. Researchers and students in the area of Big Data will be encourage to setup start-up companies as part of the entrepreneur and incubation agenda of the UC. TAR UMT tasks the Centre to take up research projects that will be undertaken by academics and industry to close the gaps found in the industry, business and scientific communities so that the process of solutions design promote mutual, cross-learning between the academics, students and the industry practitioners.

iv. Research Centre, Research Group Leader and Members

Research Centre ChairProf. Dr. Lim Tong Ming

Big Data is the oil of the decade. Farris (2012) and The Economist (2017) claimed that oil of the new digital economy is no longer crude oil but its data. With the advancement of Artificial Intelligence (AI), compact and powerful mobile equipment such as cell phones and wearable devices, social media platforms such as Facebook, Twitter and blogosphere sites and Internet of Things (IoT) like sensors and detectors are driving data complexity, new forms and sources of data. Big Data analytics is the use of advanced analytic techniques such as deep learning algorithms: deep neural networks, deep belief networks and recurrent neural networks on very large and diverse data sets of many petabytes. These data are structured (such as tables in Oracle databases), semi-structured (such as CSV and JSON that are not stored in RDBMS) and unstructured data (such as e-mail, twits, social media messages, documents, videos, photos, audio files, presentations, blogs and web pages), from multiple sources in different sizes to yield insights for planning and decisions. These data are always generated in real time.

Big Data is a term applied to data sets with low-latency whose size or type is beyond the ability of traditional relational databases to capture, manage, and process. In addition, it has one or more of the following characteristics – high volume, high velocity, high variety, high veracity and high complexity.

Analyzing Big Data allows analysts, researchers, and business users to make better and faster decisions using data that was previously inaccessible or unusable. Using advanced analytics techniques such as text analytics, machine learning, predictive and prescriptive modelling, data mining, statistics, and natural language processing, businesses and scientists are able to analyze previously untapped data sources independently or together with their existing enterprise data to gain new insights resulting in better and faster decisions.

There are four (4) research groups in CDSA:

(a) Text and Sentiment Analysis Group

Research Group Leader – TBA

Location: Big Data Analytic Centre, Block SA, TAR UMT Main Campus

Text analytics is the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence. Sentiment analysis determines if an expression is positive, negative, or neutral, and to what degree. In other words, text analytics studies the face value of the words, including the grammar and the relationships among the words. Simply put, text analytics gives you the meaning. Sentiment analysis gives you insight into the emotion behind the words.

The research group is currently focused on text analytics and sentiment analysis. The group is solving difficult research issues such as social media messages that are composed in Chinese, Malay and English across multiple industry domains (F&B, Fashion, Politics and Cosmetic). The group has put an effort to extend lexicons for multiple domains by adopting SenticNet, WordNet and WordNet-Affect. The research outcomes produce useful users’ sentiment, latent topics, fake news or deception detection from customer data. They are critical components of successful customer experience management for businesses.

i) Big Data Localised Lexicon Sentiment and Emotion Insights

Members: Prof. Dr. Lim Tong Ming, Dr. Tan Chi Wee, Ms. Kathleen Tan Swee Neo, Mr. Lim Kong Hua

b) Web Mining Group

Research Group Leader – Prof. Dr. Lim Tong Ming

Member– Lee Seah Fang

Location: Big Data Analytic Centre, Block SA, TAR UMT Main Campus

Web mining is a data mining technique for discovering patterns on the Web. There are three sub-disciplines in Web mining: Web usage mining, Web content mining, and Web structure mining. Web usage mining is the process of mining usage patterns from web data such as the user access patterns. Web content mining, sometimes called web text mining, involves mining the content of a web page so as to discover useful information or knowledge. Finally, Web structure mining is the analysis of the structure of nodes and connections of a web site. It involves mining either the relationships between web pages containing hyperlinks or the structure within a web page document.

Currently, the Web Mining group is working on two fronts. First, the group is working on techniques for mining websites of commercial interests.  Commercial websites can exist in various layouts and formats. This poses problems on large scale data collection from such websites. Second, websites with various layouts and designs can either positively or negatively influence the accessibility of a page or, say, its commercial value where the layout may affect the click rate on advertisements. We will examine the use of graph data mining to allow us to identify a set of specific patterns that favorably influence the advertising click rate of advertisement and use these to analyse the likely effectiveness of different web page layouts or their particular features.

(c) Audio, Image and Video Analytics Group

Members: Prof. Dr. Lim Tong Ming, Assoc. Prof. Ts. Dr. Tew Yiqi, Dr. Tan Chi Wee

Research Group Leader – Dr. Tan Chi Wee

Member Dr. Lim Khai Yin

Location: Big Data Analytic Centre, Block SA, TAR UMT Main Campus


Image, audio and video analysis include any technique capable of extracting from the data high-level information, i.e. information that is not explicitly stated, but it requires an abstraction process. The group utilizes machine learning techniques and statistical approaches to carry out a lot of industry research experiments. The group experiments industry driven works such as


1. Reveal who are the people in audio recordings said by diminishing background noise and enhancing the conversation.

2. Conduct photogrammetric analysis on photos or videos to determine dimensions of objects and site features.

3. Enhance images and video to better reveal their contents and to develop trial animations and exhibits.

4. Conduct sound level analysis around businesses to determine neighbourhood sound levels.

5. Document physical evidence during site and lab inspections.

6. Surveillance video analysis to aid event analysis and crash reconstruction.

7.Surveillance video processing and enhancement to aid analysis and testifier presentations.


The group is also working on detecting emotion or feeling by analysing audio streaming for support centers in the banks and information technology companies. Understanding multiple languages (for country such as Malaysia) to provide intelligence responses to the customers who communicate at the end of the phone with auto detection of the emotional state of the conversation in progress is critical for many business corporations. This is because businesses believe that getting new customers cost higher than retaining existing customers.

(d) Predictive and Forecast Modelling Group 

Research Group Leader – Dr Lim Khai Yin

Members: Prof. Dr. Lim Tong Ming, Ts. Tan Wai Beng, Ts. Jessie Teoh, Ms. Choon Kwai Mui

Location: Big Data Analytic Centre, Block SA, TAR UMT Main Campus 

Prediction is an estimation of any event happening in the past, present or future. For example, to predict the percentage of house owners buying home insurance. On the other hand, forecasting is always associated with a time dimension in the future. This involves estimation for some specific future duration or over a period of time. For example, to forecast the total sales in July, 2017 for Apple. Forecasting is a subset of prediction. This research group uses and experiments by proposing and designing models based on structured and unstructured data from the sensors of machines such as CCTV and corrugators of production lines and social media data such as Facebook and Instagram to recommend and provide insights for faster and accurate decision making process. The group will use R, Python, KNIME, and Rapidminer to program and construct predictive and forecasting models, analyse model outcomes, evaluate and measure models’ performances and storytelling the outcomes against the goals of projects for users in the business and scientific communities. Combination of supervised and unsupervised machine learning and statistical approaches will be considered and evaluated. Fusion of structured data set and unstructured social media transformed variables will be correlated and studied to determine their importance in order to explore in the models to be developed. The group is working on digital marketing and social media listening companies since the setup of group in the Centre. 

(e) Statistical Analysis and Survey Group 

Research Group Leader – Ms Fong Wai Sham

MembersMr Chee Keh Niang, Ms Lee Shu Gyan, Ms Chong Voon Niang, Ms Yap Saw Teng, Ms Loo Bee Wah, Mr Chong Kam Yoon, Dr Christopher Lazarus (Sabah branch), Ms. Tan Peck Yen (Johor branch), and any interested relevant academic staff who may be appointed from time to time.

Location - Laboratory with SPSS software, Block B, TAR UMT main, branch campuses.

This research group conducts research related to theory and application of data science and statistics; provides support and infrastructure to its members for solving data centric and data intensive research problems; and provides consultancy and quantitative analysis service to other education or research institutions as well as for the industry and conducts trainings and workshops. This group’s vision strives to become a reputable, well trusted statistical research and consultancy unit, whose service is well sought after by industries, commercial firms and other organisations of the community.

The group plans to

The group will offer an integrated, comprehensive statistical consulting service covering all aspects of a quantitative research project ranging from the initial study design through to the presentation of the final research conclusions. The Centre will work cooperatively with other established centres across the University College and also conduct research on social, economic and political issues using survey data from large, representative national samples.

v. Industry Projects