Index

Note: Page numbers followed by b indicates boxes, f indicates figures, and t indicates tables.

A

ACID transactions 142

Adaptive CUSUM algorithm 320–322, 321f

Adjusted Rand index (ARI) 373

Affinity propagation 372

All-sources BFS (AS-BFS) on GPU 121

algorithms for accelerating 125–126

performance study of 126–128

Alphabet-based LD 65–66, 65f

ANN See Artificial neural network (ANN)

Anomaly detection 85–88

accuracy and time to detect 88

Grubbs test 86–87

Kalman filtering 88

text data streams 88

Tietjen-Moore test 87

Anonymization 298–300, 298–299f

data releasing 298–299

social networks 299–300

Antenna movement model, CBID system 322–323, 323f

Anticipatory fetching 42

Apache Hadoop framework 342, 350–351

Apache Software Foundation (ASF) 20

ApplicationMaster 163–164

ARI See Adjusted Rand index (ARI)

Artificial neural network (ANN) 96–98, 97–98f

AskDOM 341–342, 342f, 350, 350f

Aspect-based sentiment analysis 74, 74f

AS-Skitter graph, decomposition 131f

Association rule mining 391

Attack model 288–289, 289f

Autoencoders 99–100, 100f

B

Backpropagation (BP) algorithm 100–101, 101–102f

Barabási-Albert random graph model 127

Barrierless MapReduce 218

Base transceiver station (BTS) 309

Bayesian differential privacy 303–304

alternative platforms for 33t

batch processing 215

stream processing 215

tools 216, 216t

business intelligent domain 11–13

comprehensive meaning 35f

methodology for 7, 8f

motivations 10–11

3² Vs definition 13

3Vs definition (Gartner) 7–8, 9f

4Vs definition (IBM) 8, 9f

6Vs definition (Microsoft) 8–9, 9f

Big Data analytics (BDA) 14

case study 357–384

Binary Large Objects (BLOB) 150

Bin packing problem 255

Biomedical data, NER 71

Block, CUDA 107

Bolt process 47

Bonneville Power Administration (BPA) 417–418, 423

Breadth-first search (BFS)

complex networks 124

BTS See Base transceiver station (BTS)

Business intelligent (BI) domain 11–13

Byte n-gram-based LD 66–67

C

Caffe See Convolutional architecture for fast feature embedding (Caffe)

Call data records (CDRs) 391

Capacity Scheduling 220–221

CBID system See Customer behavior identification (CBID) system

CDN See Content delivery network (CDN)

CDRs See Call data records (CDRs)

Cellular network, video-on-demand 392–393

Centrality metrics 123

CEP See Complex event processing (CEP)

CF-based recommender systems 81

Classical Gilbert graph model 127

Cloud computing (CC) 285–286

resource management

desired resource allocation properties 166–167

free riding 171–172

gain-as-you-contribute fairness 171–172

long-term resource allocation policy 168–170

multiresource fair sharing issues 174–175, 174t

reciprocal resource fairness 172, 175–179

resource allocation model 172–174, 173f

resources-as-you-pay fairness 168

sharing incentive 171–172

strategy-proofness problem 167

trivial workload problem 167

scheme for 290, 290f

secure queries in 286–295

Cluster algorithms 371–372, 372f

Clustering-based opinion summarization 340, 344–348, 345f

Clustering metrics 123

Clusters, ranking 80–81

Coarse-grained propagation model 317

Collaborative filtering analysis 391, 398, 400–401

Community cloud See Federated cloud

Community structure 122

Complex event processing (CEP) 44–45

EventSwarm service 57

for financial market data processing 55–58

Complex networks 119

algorithms for accelerating 125–126

performance study of 126–128

characterization and measurement 121–123

heterogeneous computing

graph partitioning for 129

HPC traversal 124–125

k-core 129–133

patterns on 123t

Compositional sentiment analysis 75

Compute Unified Device Architecture (CUDA) programming 105–107, 107f

Conditional random field (CRF) 70, 70f

Consistent hashing 151

Content-based recommender systems 81

Content delivery network (CDN) 257

Conventional machine learning model 98

Convolutional architecture for fast feature embedding (Caffe) 104

convolution, parallel implementation 109–111, 110f

CUDA programming 105–107

data storage 107

development 104t

execution mode of 108f

layer communication 108f

layer topology in 107–109

LeNet topology in 109f

Convolutional neural network (CNNs) 101–102

architecture overview of 102, 103f

convolutional layer 103

full connection layer 104

input layer 102

local connectivity 103

pooling layer 103, 104f

Correlation analysis 286

CBID system 322–328

differential privacy 302–304

privacy 296–304, 297f

COTS HPC system 114

CPU resource management 162

Create, read, update, and delete (CRUD) operations 142

CRF See Conditional random field (CRF)

CRM and movie-watched information 391

Crowdsourcing techniques 309–311

CUDA programming See Compute Unified Device Architecture (CUDA) programming

Customer behavior identification (CBID) system 319–328

explicit correlation 322–324

antenna movement model 322–323, 323f

IMR 323–324, 325f

implicit correlation 325–328, 327f

iterative clustering algorithm 326–328, 328f

problem formulation 325–326

segment-based interpolation approach 326

objectives 320f

popular item 320–322

D

data processing tools 53

Data and Opinion Mining (DOM) 339–340

conceptual framework 342f

core functions 342

implementation 350–351

Core Service 351

server section 350–351

system architecture 341–342

Database Management System (DBMS) 139

future directions 156–157

navigational databases 139–140, 140f

hierarchical model 140

network model 140

NoSQL See Not only SQL (NoSQL)

relational data models 140–143

data modeling process 141

join operations 141–142, 142f

query language 140

relational algebra 140

schema normalization 141–142

tabular organization 141, 141f

transactions 142

two-phase commit 142–143

Database management systems 53

Data cleansing 53

Data collection 53

Data mining See Knowledge discovery in database (KDD)

Data-model parallelism 114, 115f

Data parallelism 113, 113f

Data preprocessing 351

human object estimation 329–330

Data processing engine

phasor measurement unit 427

Data stream analytics platforms 41

programmatic EPSs 50–52

query-based EPSs 48–49

rule-oriented EPSs 49–50

Data streaming 240–241, 243

Data stream processing 44

Amazon Kinesis 48

Hadoop ecosystem 45–46

platforms 40–41

Data transformation 53

DCT See Discrete cosine transform (DCT)

Declarative optimization engine, IaaS clouds 449–451

application background 95

artificial neural networks 96–98

autoencoders 99–100

backpropagation 100–101

Caffe See Convolutional architecture for fast feature embedding (Caffe)

learning speed 116

scalability 116

streaming data 116

training samples 115–116

concept of 98–99

convolutional neural network 101–104

DistBelief 111–112

and multi-GPUs 112–114

parallel frameworks 96

performance demands for 96

Degree centrality 349

Degree metrics 122

Density-based spatial clustering of applications with noise (DBSCAN) cluster algorithm 372

dimensional reduction analysis 382–383, 382–383t, 383–384f

pair variable analysis 383, 384t, 385f

Device-based sensing approaches 310–319

floor plan and RSS readings mapping 314–317

unsupervised mapping 315–317

graph matching based tracking 318

overview 310–311

RSS trajectories matching 311–313, 312f

directional shadowing problem 311

fingerprints extraction 311–313

fingerprints transition graph 313, 314f

user localization 318

Device-free sensing approaches 310, 319–334

customer behavior identification 319–328

explicit correlation 322–324

implicit correlation 325–328

popular item 320–322

human object estimation 328–334

data preprocessing 329–330

feature extraction 330–333

machine learning-based estimation 333–334

Dictionary-based LD 66, 66f

Differential privacy 300–304

Bayesian 303–304

definitions 300

Gaussian Correlation Model 304

for histogram 302

K-means clustering 302

optimization 300–301

PINQ framework 302, 303f

Digital watermarking 295–296

Dijkstra’s algorithm 373

Direction-of-Arrival (DoA) detection 311

Discrete cosine transform (DCT) 330, 333

Distance metrics 123

DistBelief 111–112

DLLs See Double linked lists (DLLs)

DoA detection See Direction-of-Arrival (DoA) detection

Document pivot method 77–78

Documents embedding 155

DOM See Data and Opinion Mining (DOM)

Domain adaptation

text mining 76–77

Dominant resource fairness (DRF) 222–223

Double linked lists (DLLs) 254

Downpour SGD 111, 112f

Drag model 256, 261–262, 261f

Drop model 256, 261–262, 261f

E

EDRs See Event data records (EDRs)

EGI Federated Cloud Task Force 438

Encrypted cloud data 285–286

search over architecture 287f

secure queries over 287–295

attack model 288–289, 289f

index-based secure query scheme 290–295

system model 287

threat model 288, 288f

head node 293, 293f

intermediate nodes 292, 292f

secure inner product preserving 294, 295f

cloud computing 431–432, 435–436, 436f

grid-based 434–435

pattern detection 52

processing system 44, 52t

real-time analytics 43

Event-condition-action (ECA) rules 50

Event data records (EDRs) 391, 400–401

Event pattern 45

for duplicate dividends 56t

for earnings calculation 56t

Event processing languages (EPLs) 44–45

Event stream processing 44

EventSwarm software framework 50–51, 51f

Explicit social links 83

Exponentially weighted moving average (EWMA) 86

Extraction, transformation, and load (ETL) 3

Extract n-grams 79–80

F

Fair resource sharing

Hadoop framework 191–192

TaskTracker 191–192

Feature pivot method 77–78

Federated cloud 438

Filter, EPS 51–52

Finance domain requirements

data pre-processing 55f

real-time analytics in 54–55

First-in-first-out (FIFO) scheduling algorithm 220–221

First Normal Form (1NF) 141

Flash technology 42

FlatLFS 224–225

Friis Equation 322

G

Gartner’s interpretation See 3Vs of Big Data

Gaussian correlation model 304

GFS See Google File System (GFS)

GIG See Grid Infrastructure Group (GIG)

Global positioning system (GPS) 309, 417

Google File System (GFS) 20–23

architecture 22f

GPS See Global positioning system (GPS)

Graph API 340–341

Graph-based n-gram approach (LIGA) 65

Graphics processing units (GPUs) 124

architecture of 105–106

algorithms for accelerating 125–126

performance study of 126–128

performance 105f

simplified architecture of 106f

Graph matching algorithm 315

corridor points matching 317, 317–318f

graphs normalization 316

rooms points matching 317, 318f

skeleton graph extraction 315

skeletons matching 316

Graph-matching-based tracking 318

Graph partitioning strategy 120

for heterogeneous computing 128–129

Grid-based eScience 434–435

Grubbs test 86–87

H

Haar cascade algorithm 364, 364f

Hadoop 163–164, 216

availability optimization 232

disadvantages 20

distinguishing features 33

ecosystems 32–33

efficiency optimization

fault tolerance 231

flow mechanism 231

MapReduce computation models 231

Matchmaking 231

prediction-execution strategy 232

framework 19f, 217, 217f

application optimization 229

framework 228–229

load balancing 229–230

read-and-write optimization 230

security enhancements 226–228

small file performance optimization 224–226

job management framework 223

job scheduling mechanism

BalancedPool algorithm 221–222

capacity scheduling 220–221

dominant resource fairness 222–223

FIFO scheduling algorithm 220–221

HFS scheduling algorithm 220–221

key functions 33

Lucene 25–27, 26f

scalability 31–32

scale-up and scale-out 19

Hadoop Distributed File System (HDFS) 20–23, 239, 440

architecture 22f

real-time analytics 46

security enhancements

authorization 226

certification 226

data disaster recovery 226

novel method 227

token-based authentication mechanisms 226–227

small file performance optimization

FlatLFS 224–225

Har filing system 224–225

hierarchy index file merging 225–226

issues and solutions 224

MSFSS 224–225

SFSA strategies 224–225

SmartFS 225–226

write/read limits 242

Hadoop Fair Scheduling (HFS) algorithm 220–221

Hadoop/MapReduce 239

performance bottlenecks 241–243, 241f

bulk storage 242

under parallel loads 243–244, 243f

shared memory 242, 244–245, 245f, 248–250

storage 244–248, 245f

Hadoop schedulers 190

HaLoop 218–219

Hard disk drives 140

application optimization 229

framework 228–229

load balancing 229–230

read-and-write optimization 230

parameter spaces 245f

HDFS See Hadoop Distributed File System (HDFS)

Heterogeneous computing

graph density 129

graph partitioning for 129

partitioning 128

Hierarchical clustering 339–340

High-frequency algorithmic trading 54

High-performance computing (HPC) 434, 437, 441

Big Data processing and 241–242, 241f

NoSQL graph databases 120

traversal of large networks 124–125

Hill-climbing method 339–340

Histogram query, differential privacy for 302

Hive, real-time analytics 46

Hotspot distribution 242, 256–258, 258f

HPC See High-performance computing (HPC)

Human object estimation 328–334

data preprocessing 329–330

feature extraction 330–333

machine learning-based estimation 333–334

Hungarian algorithm 316

Hybrid cloud 438

I

IDC algorithm See Iterative database construction (IDC) algorithm

Implicit social links 83

IMR See Integration of Multi-RSS (IMR)

INCA See Intelligent network caching algorithm (INCA)

INCA caching algorithm 401–402

Incremental evaluation 42

Index-based secure query scheme

for cloud computing 290, 290f

implementations 291–295, 291–295f

Index-free adjacency technique 153–154

InfiniteGraph 120

Information explosion 4

In-memory processing 42

Integration of Multi-RSS (IMR) 323–324, 325f

Intelligent network caching algorithm (INCA) 390

cache hits 410, 410f

vs. online algorithm 407

optimization problem 389–390, 403–404

performance 410–411

with prefetch bandwidth 407–408, 408f

satisfied users 412, 413f

Interleave MapReduce scheduler

slot manager 196–197

task dispatcher

map task scheduling 197

reduce task scheduling 197

task slot 196, 196f

Internet of Things (IoT) devices 309–310

device-based sensing approaches 310–319

evaluation 318–319

floor plan and RSS readings mapping 314–317

graph matching based tracking 318

overview 310–311

RSS trajectories matching 311–313, 312f

user localization 318

device-free sensing approaches 310, 319–334

customer behavior identification 319–328

human object estimation 328–334

Intertenant resource trading (IRT) 175–178, 176f, 177b

Intratenant weight adjustment (IWA) 176f, 178–179, 178b

Inverted index 292–294

structure 291–292, 291f

IoT devices See Internet of Things (IoT) devices

Isomap method 373

Iterative clustering algorithm

with cosine similarity 326–328

Iterative database construction (IDC) algorithm 301

J

Jaccard similarity 80

K

Kahn process networks (KPNs) 218

Kalman filtering 88

K-core-based complex-network unbalanced bisection (KCMax) 129–133

AS-Skitter graph decomposition 131f

dense partition produced by 132t

sparse partition produced by 132t

K-means clustering 76–77, 302

Knowledge discovery in database (KDD) 16

L

Label bias problem 70

Lambda architecture 29

elements of 31f

implementation 32f

process steps of 31f

Language detection (LD)

alphabet-based LD 65–66, 65f

byte n-gram-based LD 66–67

combined system 67–68, 68f

dictionary-based LD 66, 66f

graph-based n-gram approach 65

n-gram-based approach 64

user language profile 67

Language identification See Language detection (LD)

Laplace-Beltrami eigenvalues (LBE) 316

Large dataset 96

Large-scale deep networks 96

Large Synoptic Survey Telescope (LSST) 431

Latent Dirichlet allocation (LDA) 74

LBE See Laplace-Beltrami eigenvalues (LBE)

LBS See Location-based services (LBS)

LD See Language detection (LD)

Lexicon-based approach 73

Load balance 125

Locality sensitive hashing (LSH) 78

Local resource shaper (LRS)

architecture 194f

Capacity scheduler 211

Delay scheduler 211

design philosophy 194

Hadoop schedulers 190

Hadoop 1.X experiments 198–204

Hadoop 2.X experiments 204–210

Hadoop YARN 191

Interleave 190, 194–198

interleave MapReduce scheduler

slot manager 196–197

task dispatcher 197–198

task slot 196, 196f

MapReduce benchmarks 190, 191t

resource consumption shaping 210

Splitter 190, 194–195

VM placement and scheduling strategies 210

Location-based services (LBS) 309

Lockfree design 242, 254–255

Lockfree shared memory design 240–241

Logistic regression (LR) 368–369, 369t, 370f

Long-term resource fairness (LTRF)

cloud computing

experimental evaluation 170, 171f

motivation example 168

scheduling algorithm 168–170

Lower control limit (LCL) 86

LR See Logistic regression (LR)

M

Machine learning (ML) 358, 360–373

classification process in 98f

process 15–17, 16f

tweets sentiment analysis 361–369

classifier models 365

color degree feature 363

feature engineering 362

logistic regression 368–369, 369t, 370f

Naïve Bayes as baseline 362

in pattern module 363f

preprocessing 362

random forest 366–368, 367t, 369f

score feature 363

smile detection feature 364, 364f

support vector machine 365–366, 366–367f

training set 362

Machine learning-based estimation 333–334

Manifold algorithm 373

Barrierless MapReduce 218

comparison of 219, 219t

HaLoop 218–219

load balancing mechanism 220

Map-Reduce-Merge 218

real-time analytics 46

stream-based 218–219

task scheduling strategy 219

MapReduce framework, XDOM 342–343

MapReduce-like models 120

Map-Reduce-Merge 218

Markov decision processes (MDP) 390, 393–394, 394f, 396

Markov predictive control (MPC) 390, 396

Maximum entropy (ME) models 69

MDP See Markov decision processes (MDP)

Mean absolute error (MAE) 84

Memory-based recommender systems 82

Memory-based social recommender system 83

Memoryless resource fairness (MLRF) 166

Memory Map method 253–254

Memory resource management 162

Message passing interface (MPI) technology 242

Minkowski distance 312

MIPS See Morphological Image Processing-based Scheme (MIPS)

ML See Machine learning (ML)

Mobile devices 389

Model-based recommender systems 81–82

Model-based social recommender system 83

Model parallelism 113–114, 114f

Modified genetic algorithm (GA) 345–346

flowchart 344, 347f

sentence clustering 346–347

Monetary cost optimizations 182–183

WaaS providers 445–447

Montage workflows 442–445, 443–444f

Morphological Image Processing-based Scheme (MIPS) 331

MPC See Markov predictive control (MPC)

MPI technology See Message passing interface (MPI) technology

MSFSS 224–225

data-model parallelism 114, 115f

data parallelism 113, 113f

example system of 114

model parallelism 113–114, 114f

Multiresource management, in Cloud

free riding 171–172

gain-as-you-contribute fairness 171–172

multiresource fair sharing issues 174–175, 174t

reciprocal resource fairness 172, 175–179

resource allocation model 172–174, 173f

sharing incentive 171–172

N

Naïve Bayes 360–362

Naive Bayes method 333

Named entity recognition (NER) 68–69, 68f

applications 71

features 70, 71t

pipeline 69, 69f

statistical NLP methods 69–70

tags and evaluation 71

trends in 71–72

Natural language processing (NLP) techniques

applications 63–72

language detection

alphabet-based LD 65–66, 65f

byte n-gram-based LD 66–67

combined system 67–68

dictionary-based LD 66, 66f

graph-based n-gram approach 65

n-gram-based approach 64

NER 68–69, 68f

applications 71

pipeline 69, 69f

statistical NLP methods 69–70

tags and evaluation 71

trends in 71–72

on Twitter 71–72

in recommender systems 85

Navigational databases 139–140, 140f

hierarchical model 140

network model 140

NER See Named entity recognition (NER)

Network resource management 163

Network science 119–120

N-gram-based approach 64

NLP techniques See Natural language processing (NLP) techniques

NodeManager (RM) 163–164

Normal forms 141

Not only SQL (NoSQL)

BASE, features of 145

CAP theorem 144–145, 145f

horizontal scalability 147, 147f

join operations 149

linear scalability 146

replicating data nodes 148

core concepts 143

database characteristics 143

column-based stores 151–152

document-based stores 154–156

graph-based stores 153–154

key-value stores 150–151

graph databases 120

O

Observed distribution 78

Online clustering 79

Opinion summarization, clustering-based 340, 344–348

Ownership, of cloud infrastructures 437–438

P

Packing algorithms

Big Data replay at scale 255–256, 255f

Drop vs. Drag 256, 261–262

shared memory performance tricks 253–254

Parallel frameworks, for deep learning

Caffe. See Convolutional architecture for fast feature embedding (Caffe)

DistBelief 111–112

multi-GPUs 112–114

Parallel processing 42

Pattern recognition 426

PDC See Phasor data concentrator (PDC)

Pearson correlation coefficient 372

Pegasus workflow management system 442, 449, 450f

Phasor data concentrator (PDC) 417

Smart Grid with 418

traditional workflow 418–419

Phasor measurement unit (PMU) 417–418

data processing 427

features 426–427

known line events 423–426

Smart Grid with 418

traditional workflow 418–419

PINQ framework See Privacy integrated queries (PINQ) framework

Platform as a Service (PaaS) 441

PMU See Phasor measurement unit (PMU)

PouchDB 273–274

Pregel 120, 164

Principle component analysis (PCA) 88

anonymity 298–300, 298–299f

correlated data in Big Data 296–298

differential 300–304

correlated data publication 302–304

definitions 300

optimization 300–301

PINQ framework 302, 303f

Privacy integrated queries (PINQ) framework 302, 303f

Private clouds 437–438

Programmatic EPSs 50–52

Public clouds 438

Q

Quality-of-experience (QoE)

optimization problem 389–390, 403–404

performance 410–411

with prefetch bandwidth 407–408, 408f

Query-based EPSs 48–49

Query language 140

R

RADAR-based tracking 318–319, 319f

Random forest (RF), tweets sentiment analysis 366–368, 367t, 369f

Rank aggregation algorithms 391, 400–401

Real-time analytics

characteristics 41–43

high availability 42–43

horizontal scalability 43

complex event processing 44–45

computing abstractions for 40–41

data stream processing 44

Amazon Kinesis 48

event pattern 45

event processing 44

event stream processing 44

finance domain requirements

CEP application 55–58

real-time analytics in 54–55

selected scenarios 55

Received signal strength (RSS)

distribution 330, 331f

mapping of floor plan and 314–317

trajectories matching 311–313, 312f

Reciprocal resource fairness (RRF) 172

application performance 181–182, 181f

economic fairness 180–181, 180f

IaaS clouds 179

intertenant resource trading 175–178, 176f, 177b

intratenant weight adjustment 176f, 178–179, 178b

Recommender systems, text mining

evaluation metrics for 84

ranking accuracy 69, 85

rating prediction accuracy 84

social recommender systems 82–83

usage prediction accuracy 84

Recursive neural tensor networks (RNTN) 75, 75f

Relational algebra 140

Relational Database Management Systems (RDBMSs) 140

Relational data models 140–143

data modeling process 141

join operations 141–142, 142f

query language 140

relational algebra 140

schema normalization 141–142

tabular organization 141, 141f

transactions 142

two-phase commit 142–143

Replay method 239, 243–244, 250–252

jobs as sketches on timeline 251–252

on multicore method 250, 250f

performance bottlenecks under 252

representation 251

at scale, packing algorithms 255–256, 255f

Replicating data node 148

Resilient distributed dataset (RDD) 27, 46–47, 165

Resource consumption shaping 189

Resource management

Big Data analysis

Hadoop 163–164

cloud computing

desired resource allocation properties 166–167

free riding 171–172

gain-as-you-contribute fairness 171–172

long-term resource allocation policy 168–170

lying 171–172

multiresource fair sharing issues 174–175, 174t

reciprocal resource fairness 172, 175–179

resource allocation model 172–174, 173f

resources-as-you-pay fairness 168

sharing incentive 171–172

strategy-proofness problem 167

trivial workload problem 167

CPU and memory 162

fairness optimization 183

monetary cost optimization 182–183

ResourceManager (RM) 163–164

Resource sharing 161–162

Rice University Bulletin Board System (RUBBoS) 179

Root mean squared error (RMSE) 84

RRWM algorithm 316

Rule-based approaches, text mining 73

Rule-oriented EPSs

event-condition-action rules 50

production rules 49–50

S

Sandblaster batch optimization framework (L-BFGS) 111–112, 112f

SC See Silhouette coefficient (SC) See also Spectral clustering (SC)

database systems 147

of deep models 116

distributed systems 146

real-time analytics 43

Scale-free (SF) degree distribution 121

Scaling metrics 123

Searchable encryption (SE) scheme 289

Searchable symmetric encryption (SSE) scheme 289

Search queries, NER 71

in cloud computing 286

digital watermarking 295–296

eScience applications 440

queries over encrypted Big Data 287–295

index-based secure query scheme 290–295

self-adaptive risk access control 296

Segment-based interpolation approach, CBID system 326

Self-adaptive MapReduce (SAMR) 220

Self-adaptive risk access control 296

SENIL 310, 311f, 313, 318–319

Sentence clustering process 346–348

Sentiment analysis

text mining 72–73, 76–77

Lexicon-based approach 73

rule-based approaches 73

statistical methods 73–76

weather and Twitter 357

back-end architecture 358–359, 359f

Big Data system components 358–360

classifier models 365

color degree feature 363

daily data analysis 380–381, 381–382f

DBSCAN cluster algorithm 382–383, 383f

front-end architecture 359, 360f

hourly data analysis 378, 379–381f

impact on emotion 383–384, 386–387f

logistic regression 368–369, 369t, 370f

machine-learning methodology 360–373

in pattern module 363f

random forest 366–368, 367t, 369f

score feature 363

smile detection feature 364

straightforward weather impact on emotion 383–384

support vector machine 365–366, 366–367f

system implementation 373–378

time series 378

XDOM 342–344, 345f

SE scheme See Searchable encryption (SE) scheme

modeling methodology 258–259

on-chip version 244

packing algorithms 253–254

parameter spaces for 244–245

performance 248–250

performance bottlenecks 242, 252, 259–260, 260f

replay method 252

SSD/HDD vs. 245f

storage and 244

Shared-nothing data processing 24

Silhouette coefficient (SC) 373

Single points of failure (SPOF) 240, 251

Single-resource management, in Cloud 166–170

desired resource allocation properties 166–167

long-term resource allocation policy 168–170

experimental evaluation 170, 171f

motivation example 168

scheduling algorithm 168–170

resources-as-you-pay fairness 168

strategy-proofness problem 167

trivial workload problem 167

Skeleton-based matching 315

Small-world networks 121

Small-world phenomenon 121

Smart Grid 417, 426–427

characterizing normal operation 419

cumulative probability distribution 421

identifying unusual phenomena 420–422

improving traditional workflow 418–419

known events identification 423–426

with PMUs and PDCs 418

Smile detection, feature 364

Social networks

anonymity for 299–300

Big Data and data analytics 270

Cloud-based Big Data collection

architecture 274, 274f

bounding box tweet retrieval 274, 275f

thin client paradigm 275

correlations in 296–298

graph 296–297, 296f

location-based services 270–271

location privacy 275–281

consequences 280–281

location losing privacy 276

reveal location privacy 276

privacy management 270

social media software systems

Google Plus 271–272

tracking users, via tweets 269, 270f

Social recommender systems 82–83

Software stack 360, 361f

Sparse matrix-vector multiplications (SpMVs) 125

AS-BFS 125–127

Spectral clustering (SC) 315

Speculative execution mechanism 219

SPOF See Single points of failure (SPOF)

SSE scheme See Searchable symmetric encryption (SSE) scheme

Stanford Rapide project 44

Statistical analysis 53

Statistical data analysis tools 53

Statistical methods, text mining 73–76

Statistics domain 13

modeling methodology 258–259

parallel threads in 245

parameter spaces for 244–245, 245f

performance 245–248

Storage resource management (SRM) 163

Stored data analytics platforms 41

Stored data processing platforms 41

Structured Query Language (SQL) 140

Support vector machines (SVMs) 426–427

tweets sentiment analysis 365–366, 366–367f

T

recommender systems

evaluation metrics for 84

ranking accuracy 85

social recommender systems 82–83

sentiment analysis 72–73

domain adaptation 76–77

Lexicon-based approach 73

rule-based approaches 73

statistical methods 73–76

trending topics

detection system 79

document pivot method 77

extract n-grams 79–80

jaccard similarity 80

online clustering 79

ranking clusters 80–81

Text watermarking 295–296

Thread, CUDA 107

Tietjen-Moore test 87

Tiled MapReduce method 240

Time series analysis, weather/Twitter sentiment analysis 372, 378

Transfer error rate 76

Transformation-based optimizations framework (TOF) 447–449, 448f

Translation, NER 71

Trapdoor algorithm 290

Trending topics, text mining

detection system 79

document pivot method 77–78

extract n-grams 79–80

feature pivot method 77–78

jaccard similarity 80

online clustering 79

ranking clusters 80–81

Trust- and influence-based links 83

Two-phase commit 142–143

V

Validation procedure, DOM 352–353

Video-on-demand (VoD) 389–390, 398

adaptive video caching framework 396

cellular network 392–393

core and edge components 397–398f, 400

data generation 399

INCA caching algorithm 401–402

Markov processes 393–394

QoE estimation 403

synthetic dataset 409–412

theoretical framework 403–404

wireless request processing 393f

Virtual machines (VMs) 161–162

VoD See Video-on-demand (VoD)

Voltage deviation 422f

normal operation 419, 420–421f

3² Vs of Big Data 13, 14–15f

3Vs of Big Data (Gartner) 7–8, 9f

4Vs of Big Data (IBM) 8, 9f

6Vs of Big Data (Microsoft) 8–9, 9f

W

WaaS See Workflow-as-a- service (WaaS)

WAMS See Wide area measurement system (WAMS)

digital 295–296

Wide area measurement system (WAMS) 427

Wireless network analytics, applications of 390f

Wireless service providers (WSPs) 395

WLog program 449–450, 450t

WMSes See Workflow management systems (WMSes)

Workflow-as-a- service (WaaS) 445–446

Workflow in IaaS clouds

complex structures 443

declarative optimization engine 449–451

diverse cloud offerings 442

monetary cost optimizations 445–447

resource provisioning 442

transformation-based optimizations framework 447–449, 448f

Workflow management systems (WMSes) 439, 449

WSPs See Wireless service providers (WSPs)

X

XDOM (eXtension of DOM) 339–340

clustering-based summarization framework 344–348, 345f

data sources 340–341, 341f

implementation 350–351

influencer analysis 349

MapReduce framework 342–343

sentiment analysis 343–344, 345f

system architecture 341–342

validation procedure 352–353

Y

Yet Another Resource Negotiator (YARN) 46, 163–164, 183, 205

Z

ZooKeeper 46, 229–230