Index

Note: Page numbers followed by b indicates boxes, f indicates figures, and t indicates tables.

A

ACID transactions 142
Adaptive CUSUM algorithm 320–322, 321f
Adjusted Rand index (ARI) 373
Affinity propagation 372
All-sources BFS (AS-BFS) on GPU 121
algorithms for accelerating 125–126
performance study of 126–128
Alphabet-based LD 65–66, 65f
ANN  See Artificial neural network (ANN)
Anomaly detection 85–88
accuracy and time to detect 88
Grubbs test 86–87
Kalman filtering 88
text data streams 88
Tietjen-Moore test 87
Anonymization 298–300, 298–299f
data releasing 298–299
social networks 299–300
Antenna movement model, CBID system 322–323, 323f
Anticipatory fetching 42
Apache Hadoop framework 342, 350–351
Apache Software Foundation (ASF) 20
ApplicationMaster 163–164
ARI  See Adjusted Rand index (ARI)
Artificial neural network (ANN) 96–98, 97–98f
AskDOM 341–342, 342f, 350, 350f
Aspect-based sentiment analysis 74, 74f
AS-Skitter graph, decomposition 131f
Association rule mining 391
Attack model 288–289, 289f
Autoencoders 99–100, 100f

B

Backpropagation (BP) algorithm 100–101, 101–102f
Barabási-Albert random graph model 127
Barrierless MapReduce 218
Base transceiver station (BTS) 309
Bayesian differential privacy 303–304
Big Data 
alternative platforms for 33t
applications 
batch processing 215
Hadoop 216
HBase 216
HDFS 216
MapReduce 216
stream processing 215
tools 216, 216t
business intelligent domain 11–13
comprehensive meaning 35f
data domain 11
definition 
methodology for 7, 8f
motivations 10–11
type 10t
32 Vs definition 13
3Vs definition (Gartner) 7–8, 9f
4Vs definition (IBM) 8, 9f
6Vs definition (Microsoft) 8–9, 9f
Big Data analytics (BDA) 14
case study 357–384
Binary Large Objects (BLOB) 150
Bin packing problem 255
Biomedical data, NER 71
Block, CUDA 107
Bolt process 47
Bonneville Power Administration (BPA) 417–418, 423
Breadth-first search (BFS) 
complex networks 124
frontier 126
BTS  See Base transceiver station (BTS)
Business intelligent (BI) domain 11–13
Byte n-gram-based LD 66–67

C

Caffe  See Convolutional architecture for fast feature embedding (Caffe)
Call data records (CDRs) 391
Capacity Scheduling 220–221
CBID system  See Customer behavior identification (CBID) system
CDN  See Content delivery network (CDN)
CDRs  See Call data records (CDRs)
Cellular network, video-on-demand 392–393
Centrality metrics 123
CEP  See Complex event processing (CEP)
CF-based recommender systems 81
Classical Gilbert graph model 127
Cloud computing (CC) 285–286
resource management 
desired resource allocation properties 166–167
free riding 171–172
gain-as-you-contribute fairness 171–172
long-term resource allocation policy 168–170
multiresource fair sharing issues 174–175, 174t
reciprocal resource fairness 172, 175–179
resource allocation model 172–174, 173f
resources-as-you-pay fairness 168
sharing incentive 171–172
strategy-proofness problem 167
trivial workload problem 167
scheme for 290, 290f
secure queries in 286–295
Cluster algorithms 371–372, 372f
Clustering-based opinion summarization 340, 344–348, 345f
Clustering metrics 123
Clusters, ranking 80–81
Coarse-grained propagation model 317
Collaborative filtering analysis 391, 398, 400–401
Community cloud  See Federated cloud
Community structure 122
Complex event processing (CEP) 44–45
EventSwarm service 57
example 57f
for financial market data processing 55–58
Complex networks 119
AS-BFS on GPU 
algorithms for accelerating 125–126
performance study of 126–128
BFS 124
characterization and measurement 121–123
heterogeneous computing 
graph partitioning for 129
of graphs 129t
HPC traversal 124–125
k-core 129–133
metrics 122t
patterns on 123t
Compositional sentiment analysis 75
Compute Unified Device Architecture (CUDA) programming 105–107, 107f
Conditional random field (CRF) 70, 70f
Consistent hashing 151
Content-based recommender systems 81
Content delivery network (CDN) 257
Conventional machine learning model 98
Convolutional architecture for fast feature embedding (Caffe) 104
convolution, parallel implementation 109–111, 110f
CUDA programming 105–107
data storage 107
development 104t
execution mode of 108f
layer communication 108f
layer topology in 107–109
LeNet topology in 109f
Convolutional neural network (CNNs) 101–102
architecture overview of 102, 103f
convolutional layer 103
full connection layer 104
input layer 102
local connectivity 103
pooling layer 103, 104f
Correlation analysis 286
CBID system 322–328
differential privacy 302–304
privacy 296–304, 297f
COTS HPC system 114
CPU resource management 162
CQL 49
Create, read, update, and delete (CRUD) operations 142
CRF  See Conditional random field (CRF)
CRM and movie-watched information 391
Crowdsourcing techniques 309–311
CUDA programming  See Compute Unified Device Architecture (CUDA) programming
Customer behavior identification (CBID) system 319–328
explicit correlation 322–324
antenna movement model 322–323, 323f
IMR 323–324, 325f
implicit correlation 325–328, 327f
iterative clustering algorithm 326–328, 328f
problem formulation 325–326
segment-based interpolation approach 326
objectives 320f
popular item 320–322

D

Data analysis 
data processing tools 53
phases 53
Data and Opinion Mining (DOM) 339–340
conceptual framework 342f
core functions 342
implementation 350–351
Core Service 351
I/O 351
server section 350–351
system architecture 341–342
Database 139
Database Management System (DBMS) 139
future directions 156–157
navigational databases 139–140, 140f
hierarchical model 140
network model 140
NoSQL  See Not only SQL (NoSQL)
relational data models 140–143
data modeling process 141
join operations 141–142, 142f
query language 140
relational algebra 140
schema normalization 141–142
tabular organization 141, 141f
transactions 142
two-phase commit 142–143
Database management systems 53
Data cleansing 53
Data collection 53
Data domain 11
Data mining  See Knowledge discovery in database (KDD)
Data-model parallelism 114, 115f
Data parallelism 113, 113f
Data preprocessing 351
example of 55f
human object estimation 329–330
Data processing engine 
comparison 30t
phasor measurement unit 427
Data stream analytics platforms 41
programmatic EPSs 50–52
query-based EPSs 48–49
rule-oriented EPSs 49–50
Data streaming 240–241, 243
Data stream processing 44
Amazon Kinesis 48
Flume 48
Hadoop ecosystem 45–46
Kafka 47–48
platforms 40–41
Spark 46–47
Storm 47
Data transformation 53
DCT  See Discrete cosine transform (DCT)
Declarative optimization engine, IaaS clouds 449–451
Deep learning 
application background 95
artificial neural networks 96–98
autoencoders 99–100
backpropagation 100–101
Caffe  See Convolutional architecture for fast feature embedding (Caffe)
challenges 
learning speed 116
scalability 116
streaming data 116
training samples 115–116
concept of 98–99
convolutional neural network 101–104
DistBelief 111–112
and multi-GPUs 112–114
parallel frameworks 96
performance demands for 96
Degree centrality 349
Degree metrics 122
Density-based spatial clustering of applications with noise (DBSCAN) cluster algorithm 372
definition 373
dimensional reduction analysis 382–383, 382–383t, 383–384f
pair variable analysis 383, 384t, 385f
Device-based sensing approaches 310–319
floor plan and RSS readings mapping 314–317
unsupervised mapping 315–317
graph matching based tracking 318
overview 310–311
RSS trajectories matching 311–313, 312f
directional shadowing problem 311
fingerprints extraction 311–313
fingerprints transition graph 313, 314f
user localization 318
Device-free sensing approaches 310, 319–334
customer behavior identification 319–328
explicit correlation 322–324
implicit correlation 325–328
popular item 320–322
human object estimation 328–334
data preprocessing 329–330
feature extraction 330–333
machine learning-based estimation 333–334
Dictionary-based LD 66, 66f
Differential privacy 300–304
approaches 302
Bayesian 303–304
definitions 300
Gaussian Correlation Model 304
for histogram 302
K-means clustering 302
optimization 300–301
PINQ framework 302, 303f
Digital watermarking 295–296
Dijkstra’s algorithm 373
Direction-of-Arrival (DoA) detection 311
Discrete cosine transform (DCT) 330, 333
Distance metrics 123
DistBelief 111–112
DLLs  See Double linked lists (DLLs)
DoA detection  See Direction-of-Arrival (DoA) detection
Document pivot method 77–78
Documents embedding 155
DOM  See Data and Opinion Mining (DOM)
Domain adaptation 
NER 70
text mining 76–77
Dominant resource fairness (DRF) 222–223
Double linked lists (DLLs) 254
Downpour SGD 111, 112f
Drag model 256, 261–262, 261f
Drop model 256, 261–262, 261f

E

EDRs  See Event data records (EDRs)
EGI Federated Cloud Task Force 438
Encrypted cloud data 285–286
search over architecture 287f
secure queries over 287–295
attack model 288–289, 289f
index-based secure query scheme 290–295
SE scheme 289
SSE scheme 289
system model 287
threat model 288, 288f
Encryption 
head node 293, 293f
intermediate nodes 292, 292f
secure inner product preserving 294, 295f
eScience 
cloud computing 431–432, 435–436, 436f
grid-based 434–435
Event(s) 
expression 51
pattern detection 52
processing system 44, 52t
real-time analytics 43
Event-condition-action (ECA) rules 50
Event data records (EDRs) 391, 400–401
Event pattern 45
for duplicate dividends 56t
for earnings calculation 56t
Event processing languages (EPLs) 44–45
Event stream processing 44
EventSwarm software framework 50–51, 51f
Explicit social links 83
Exponentially weighted moving average (EWMA) 86
Extraction, transformation, and load (ETL) 3
Extract n-grams 79–80

F

Fair resource sharing 
Hadoop framework 191–192
TaskTracker 191–192
Feature pivot method 77–78
Federated cloud 438
Filter, EPS 51–52
Finance domain requirements 
data pre-processing 55f
real-time analytics in 54–55
First-in-first-out (FIFO) scheduling algorithm 220–221
First Normal Form (1NF) 141
Flash technology 42
FlatLFS 224–225
Friis Equation 322

G

Gartner’s interpretation  See 3Vs of Big Data
Gaussian correlation model 304
GFS  See Google File System (GFS)
GIG  See Grid Infrastructure Group (GIG)
Global positioning system (GPS) 309, 417
Google File System (GFS) 20–23
architecture 22f
designing 22
types 22
GPS  See Global positioning system (GPS)
Graph API 340–341
Graph-based n-gram approach (LIGA) 65
Graphics processing units (GPUs) 124
architecture of 105–106
AS-BFS on 
algorithms for accelerating 125–126
performance study of 126–128
performance 105f
simplified architecture of 106f
Graph matching algorithm 315
corridor points matching 317, 317–318f
graphs normalization 316
rooms points matching 317, 318f
skeleton graph extraction 315
skeletons matching 316
Graph-matching-based tracking 318
Graph partitioning strategy 120
for heterogeneous computing 128–129
Grid-based eScience 434–435
Grid, CUDA 107
Grubbs test 86–87

H

Haar cascade algorithm 364, 364f
Hadoop 163–164, 216
advantages 20
availability optimization 232
creation 19
development 18
disadvantages 20
distinguishing features 33
ecosystems 32–33
efficiency optimization 
CoHadoop 231
fault tolerance 231
flow mechanism 231
MapReduce computation models 231
Matchmaking 231
prediction-execution strategy 232
framework 19f, 217, 217f
GFS 20–23
HBase 
application optimization 229
framework 228–229
load balancing 229–230
read-and-write optimization 230
storage 229
HDFS 
security enhancements 226–228
small file performance optimization 224–226
history of 22f
job management framework 223
job scheduling mechanism 
BalancedPool algorithm 221–222
capacity scheduling 220–221
dominant resource fairness 222–223
FIFO scheduling algorithm 220–221
HFS scheduling algorithm 220–221
MTSD 221–222
key functions 33
Lucene 25–27, 26f
Nutch 26–27
scalability 31–32
scale-up and scale-out 19
Hadoop Distributed File System (HDFS) 20–23, 239, 440
architecture 22f
real-time analytics 46
security enhancements 
authorization 226
certification 226
data disaster recovery 226
novel method 227
token-based authentication mechanisms 226–227
small file performance optimization 
FlatLFS 224–225
Har filing system 224–225
hierarchy index file merging 225–226
issues and solutions 224
MSFSS 224–225
SFSA strategies 224–225
SmartFS 225–226
write/read limits 242
Hadoop Fair Scheduling (HFS) algorithm 220–221
Hadoop/MapReduce 239
performance bottlenecks 241–243, 241f
bulk storage 242
network 241
under parallel loads 243–244, 243f
shared memory 242, 244–245, 245f, 248–250
storage 244–248, 245f
Hadoop schedulers 190
HaLoop 218–219
Hard disk drives 140
HBase 
application optimization 229
framework 228–229
load balancing 229–230
read-and-write optimization 230
storage 229
HDD/SSD 244
disk 242
parameter spaces 245f
HDFS  See Hadoop Distributed File System (HDFS)
Heterogeneous computing 
goal of 128
graph density 129
graph partitioning for 129
of graphs 129t
partitioning 128
switching 128
Hierarchical clustering 339–340
High-frequency algorithmic trading 54
High-performance computing (HPC) 434, 437, 441
Big Data processing and 241–242, 241f
NoSQL graph databases 120
traversal of large networks 124–125
Hill-climbing method 339–340
Histogram query, differential privacy for 302
Hive, real-time analytics 46
Hotspot distribution 242, 256–258, 258f
HPC  See High-performance computing (HPC)
Human object estimation 328–334
data preprocessing 329–330
feature extraction 330–333
machine learning-based estimation 333–334
Hungarian algorithm 316
Hybrid cloud 438

I

IDC algorithm  See Iterative database construction (IDC) algorithm
Implicit social links 83
IMR  See Integration of Multi-RSS (IMR)
INCA  See Intelligent network caching algorithm (INCA)
INCA caching algorithm 401–402
Incremental evaluation 42
Index-based secure query scheme 
for cloud computing 290, 290f
definition 291
implementations 291–295, 291–295f
Index-free adjacency technique 153–154
InfiniteGraph 120
Information explosion 4
In-memory processing 42
Integration of Multi-RSS (IMR) 323–324, 325f
Intelligent network caching algorithm (INCA) 390
cache hits 410, 410f
vs. online algorithm 407
QoE 
estimation 403
optimization problem 389–390, 403–404
performance 410–411
with prefetch bandwidth 407–408, 408f
satisfied users 412, 413f
Interleave MapReduce scheduler 
slot manager 196–197
task dispatcher 
map task scheduling 197
reduce task scheduling 197
task slot 196, 196f
Internet of Things (IoT) devices 309–310
device-based sensing approaches 310–319
evaluation 318–319
floor plan and RSS readings mapping 314–317
graph matching based tracking 318
overview 310–311
RSS trajectories matching 311–313, 312f
user localization 318
device-free sensing approaches 310, 319–334
customer behavior identification 319–328
human object estimation 328–334
Intertenant resource trading (IRT) 175–178, 176f, 177b
Intratenant weight adjustment (IWA) 176f, 178–179, 178b
Inverted index 292–294
structure 291–292, 291f
table 229
IoT devices  See Internet of Things (IoT) devices
Isomap method 373
Iterative clustering algorithm 
with cosine similarity 326–328
example 328f
Iterative database construction (IDC) algorithm 301

J

Jaccard similarity 80

K

Kafka 47–48
Kahn process networks (KPNs) 218
Kalman filtering 88
K-core-based complex-network unbalanced bisection (KCMax) 129–133
AS-Skitter graph decomposition 131f
dense partition produced by 132t
sparse partition produced by 132t
K-means clustering 76–77, 302
Knowledge discovery in database (KDD) 16

L

Label bias problem 70
Lambda architecture 29
elements of 31f
implementation 32f
process steps of 31f
speed layer 32
Language detection (LD) 
alphabet-based LD 65–66, 65f
byte n-gram-based LD 66–67
combined system 67–68, 68f
dictionary-based LD 66, 66f
graph-based n-gram approach 65
n-gram-based approach 64
user language profile 67
Language identification  See Language detection (LD)
Laplace-Beltrami eigenvalues (LBE) 316
Large dataset 96
Large-scale deep networks 96
Large Synoptic Survey Telescope (LSST) 431
Latent Dirichlet allocation (LDA) 74
LBE  See Laplace-Beltrami eigenvalues (LBE)
LBS  See Location-based services (LBS)
LD  See Language detection (LD)
Lexicon-based approach 73
Load balance 125
Locality sensitive hashing (LSH) 78
Local resource shaper (LRS) 
architecture 194f
Capacity scheduler 211
challenges 190
Delay scheduler 211
design philosophy 194
Hadoop schedulers 190
Hadoop 1.X experiments 198–204
Hadoop 2.X experiments 204–210
Hadoop YARN 191
Interleave 190, 194–198
interleave MapReduce scheduler 
slot manager 196–197
task dispatcher 197–198
task slot 196, 196f
MapReduce benchmarks 190, 191t
resource consumption shaping 210
Splitter 190, 194–195
VM placement and scheduling strategies 210
Location-based services (LBS) 309
Lockfree design 242, 254–255
Lockfree shared memory design 240–241
Logistic regression (LR) 368–369, 369t, 370f
Long-term resource fairness (LTRF) 
cloud computing 
experimental evaluation 170, 171f
vs. MLRF 169t
motivation example 168
scheduling algorithm 168–170
Lower control limit (LCL) 86
LR  See Logistic regression (LR)

M

Machine learning (ML) 358, 360–373
classification process in 98f
definition 14
process 15–17, 16f
tweets sentiment analysis 361–369
classifier models 365
color degree feature 363
feature engineering 362
logistic regression 368–369, 369t, 370f
Naïve Bayes as baseline 362
in pattern module 363f
preprocessing 362
random forest 366–368, 367t, 369f
score feature 363
smile detection feature 364, 364f
support vector machine 365–366, 366–367f
training set 362
Machine learning-based estimation 333–334
Manifold algorithm 373
MapReduce 24
Barrierless MapReduce 218
comparison of 219, 219t
HaLoop 218–219
KPNs 218
load balancing mechanism 220
Map-Reduce-Merge 218
process 24
real-time analytics 46
steps 23f
stream-based 218–219
task scheduling strategy 219
MapReduce framework, XDOM 342–343
MapReduce-like models 120
Map-Reduce-Merge 218
Markov decision processes (MDP) 390, 393–394, 394f, 396
Markov predictive control (MPC) 390, 396
Maximum entropy (ME) models 69
MDP  See Markov decision processes (MDP)
Mean absolute error (MAE) 84
Memory-based recommender systems 82
Memory-based social recommender system 83
Memoryless resource fairness (MLRF) 166
Memory Map method 253–254
Memory resource management 162
Message passing interface (MPI) technology 242
Minkowski distance 312
MIPS  See Morphological Image Processing-based Scheme (MIPS)
ML  See Machine learning (ML)
Mobile devices 389
Model-based recommender systems 81–82
Model-based social recommender system 83
Model parallelism 113–114, 114f
Modified genetic algorithm (GA) 345–346
flowchart 344, 347f
sentence clustering 346–347
Monetary cost optimizations 182–183
WaaS providers 445–447
Montage workflows 442–445, 443–444f
Morphological Image Processing-based Scheme (MIPS) 331
MPC  See Markov predictive control (MPC)
MPI technology  See Message passing interface (MPI) technology
MSFSS 224–225
Multi-GPUs 
data-model parallelism 114, 115f
data parallelism 113, 113f
example system of 114
model parallelism 113–114, 114f
Multiresource management, in Cloud 
free riding 171–172
gain-as-you-contribute fairness 171–172
multiresource fair sharing issues 174–175, 174t
reciprocal resource fairness 172, 175–179
resource allocation model 172–174, 173f
sharing incentive 171–172

N

Naïve Bayes 360–362
Naive Bayes method 333
Named entity recognition (NER) 68–69, 68f
applications 71
CRF 70
features 70, 71t
pipeline 69, 69f
statistical NLP methods 69–70
tags and evaluation 71
trends in 71–72
Natural language processing (NLP) techniques 
applications 63–72
language detection 
alphabet-based LD 65–66, 65f
byte n-gram-based LD 66–67
combined system 67–68
dictionary-based LD 66, 66f
graph-based n-gram approach 65
n-gram-based approach 64
NER 68–69, 68f
applications 71
CRF 70
features 70
pipeline 69, 69f
statistical NLP methods 69–70
tags and evaluation 71
trends in 71–72
on Twitter 71–72
in recommender systems 85
Navigational databases 139–140, 140f
hierarchical model 140
network model 140
Neo4j 120
NER  See Named entity recognition (NER)
Network resource management 163
Network science 119–120
N-gram-based approach 64
NLP techniques  See Natural language processing (NLP) techniques
NodeManager (RM) 163–164
Normal forms 141
Not only SQL (NoSQL) 
for Big Data 
BASE, features of 145
CAP theorem 144–145, 145f
horizontal scalability 147, 147f
join operations 149
linear scalability 146
replicating data nodes 148
core concepts 143
database characteristics 143
data models 
column-based stores 151–152
document-based stores 154–156
graph-based stores 153–154
key-value stores 150–151
definition 143
graph databases 120

O

Observed distribution 78
Online clustering 79
Opinion summarization, clustering-based 340, 344–348
Ownership, of cloud infrastructures 437–438

P

Packing algorithms 
Big Data replay at scale 255–256, 255f
Drop vs. Drag 256, 261–262
shared memory performance tricks 253–254
Parallel frameworks, for deep learning 
Caffe.  See Convolutional architecture for fast feature embedding (Caffe)
DistBelief 111–112
multi-GPUs 112–114
Parallel processing 42
Pattern recognition 426
PDC  See Phasor data concentrator (PDC)
Pearson correlation coefficient 372
Pegasus workflow management system 442, 449, 450f
Phasor data concentrator (PDC) 417
Smart Grid with 418
traditional workflow 418–419
Phasor measurement unit (PMU) 417–418
data processing 427
features 426–427
known line events 423–426
Smart Grid with 418
SVMs 427
traditional workflow 418–419
PINQ framework  See Privacy integrated queries (PINQ) framework
Platform as a Service (PaaS) 441
PMU  See Phasor measurement unit (PMU)
PouchDB 273–274
Pregel 120, 164
Principle component analysis (PCA) 88
Privacy 286
anonymity 298–300, 298–299f
correlated data in Big Data 296–298
differential 300–304
approaches 302
correlated data publication 302–304
definitions 300
optimization 300–301
PINQ framework 302, 303f
Privacy integrated queries (PINQ) framework 302, 303f
Private clouds 437–438
Programmatic EPSs 50–52
Public clouds 438

Q

Quality-of-experience (QoE) 
estimation 403
optimization problem 389–390, 403–404
performance 410–411
with prefetch bandwidth 407–408, 408f
Query-based EPSs 48–49
Query language 140

R

RADAR-based tracking 318–319, 319f
Random forest (RF), tweets sentiment analysis 366–368, 367t, 369f
Rank aggregation algorithms 391, 400–401
Real-time analytics 
challenges 58
characteristics 41–43
high availability 42–43
horizontal scalability 43
low latency 42
complex event processing 44–45
computing abstractions for 40–41
data stream processing 44
Amazon Kinesis 48
Flume 48
Kafka 47–48
Spark 46–47
Storm 47
event 43
event pattern 45
event processing 44
event stream processing 44
event type 45
finance domain requirements 
CEP application 55–58
real-time analytics in 54–55
selected scenarios 55
latency 42
stack 40f
Received signal strength (RSS) 
CDF 330, 330f
distribution 330, 331f
mapping of floor plan and 314–317
trajectories matching 311–313, 312f
Reciprocal resource fairness (RRF) 172
application performance 181–182, 181f
economic fairness 180–181, 180f
IaaS clouds 179
intertenant resource trading 175–178, 176f, 177b
intratenant weight adjustment 176f, 178–179, 178b
workloads 179
Recommender systems, text mining 
datasets 83
evaluation metrics for 84
NLP in 85
ranking accuracy 69, 85
rating prediction accuracy 84
social recommender systems 82–83
types 81–82
usage prediction accuracy 84
Recursive neural tensor networks (RNTN) 75, 75f
Relational algebra 140
Relational Database Management Systems (RDBMSs) 140
Relational data models 140–143
data modeling process 141
join operations 141–142, 142f
query language 140
relational algebra 140
schema normalization 141–142
tabular organization 141, 141f
transactions 142
two-phase commit 142–143
Replay method 239, 243–244, 250–252
jobs as sketches on timeline 251–252
on multicore method 250, 250f
performance bottlenecks under 252
representation 251
at scale, packing algorithms 255–256, 255f
Replicating data node 148
Resilient distributed dataset (RDD) 27, 46–47, 165
Resource consumption shaping 189
Resource management 
Big Data analysis 
Dryad 164
Hadoop 163–164
Pregel 164
Spark 165
Storm 164
cloud computing 
desired resource allocation properties 166–167
free riding 171–172
gain-as-you-contribute fairness 171–172
long-term resource allocation policy 168–170
lying 171–172
multiresource fair sharing issues 174–175, 174t
reciprocal resource fairness 172, 175–179
resource allocation model 172–174, 173f
resources-as-you-pay fairness 168
sharing incentive 171–172
strategy-proofness problem 167
trivial workload problem 167
CPU and memory 162
fairness optimization 183
monetary cost optimization 182–183
network 163
storage 163
ResourceManager (RM) 163–164
Resource sharing 161–162
Rice University Bulletin Board System (RUBBoS) 179
Root mean squared error (RMSE) 84
RRWM algorithm 316
Rule-based approaches, text mining 73
Rule-oriented EPSs 
event-condition-action rules 50
production rules 49–50

S

Sandblaster batch optimization framework (L-BFGS) 111–112, 112f
SC  See Silhouette coefficient (SC) See also Spectral clustering (SC)
Scalability 
database systems 147
of deep models 116
distributed systems 146
real-time analytics 43
Scale-free (SF) degree distribution 121
Scaling metrics 123
Searchable encryption (SE) scheme 289
Searchable symmetric encryption (SSE) scheme 289
Search queries, NER 71
Security 286
in cloud computing 286
digital watermarking 295–296
eScience applications 440
queries over encrypted Big Data 287–295
index-based secure query scheme 290–295
SE scheme 289
SSE scheme 289
self-adaptive risk access control 296
Segment-based interpolation approach, CBID system 326
Self-adaptive MapReduce (SAMR) 220
Self-adaptive risk access control 296
SENIL 310, 311f, 313, 318–319
Sentence clustering process 346–348
Sentiment analysis 
text mining 72–73, 76–77
Lexicon-based approach 73
rule-based approaches 73
statistical methods 73–76
weather and Twitter 357
back-end architecture 358–359, 359f
Big Data system components 358–360
classifier models 365
color degree feature 363
daily data analysis 380–381, 381–382f
DBSCAN cluster algorithm 382–383, 383f
front-end architecture 359, 360f
hourly data analysis 378, 379–381f
impact on emotion 383–384, 386–387f
logistic regression 368–369, 369t, 370f
machine-learning methodology 360–373
in pattern module 363f
random forest 366–368, 367t, 369f
score feature 363
smile detection feature 364
straightforward weather impact on emotion 383–384
support vector machine 365–366, 366–367f
system implementation 373–378
time series 378
XDOM 342–344, 345f
SE scheme  See Searchable encryption (SE) scheme
Sharding 147
Shared memory 
modeling methodology 258–259
on-chip version 244
packing algorithms 253–254
parameter spaces for 244–245
performance 248–250
performance bottlenecks 242, 252, 259–260, 260f
replay method 252
SSD/HDD vs. 245f
storage and 244
Shared-nothing data processing 24
Silhouette coefficient (SC) 373
Single points of failure (SPOF) 240, 251
Single-resource management, in Cloud 166–170
desired resource allocation properties 166–167
long-term resource allocation policy 168–170
LTRF 
experimental evaluation 170, 171f
motivation example 168
scheduling algorithm 168–170
resources-as-you-pay fairness 168
strategy-proofness problem 167
trivial workload problem 167
Skeleton-based matching 315
Small-world networks 121
Small-world phenomenon 121
Smart Grid 417, 426–427
characterizing normal operation 419
cumulative probability distribution 421
identifying unusual phenomena 420–422
improving traditional workflow 418–419
known events identification 423–426
with PMUs and PDCs 418
Smile detection, feature 364
Social networks 
analysis 391
anonymity for 299–300
Big Data and data analytics 270
Cloud-based Big Data collection 
architecture 274, 274f
bounding box tweet retrieval 274, 275f
thin client paradigm 275
correlations in 296–298
graph 296–297, 296f
location-based services 270–271
location privacy 275–281
consequences 280–281
location losing privacy 276
reveal location privacy 276
privacy management 270
social media software systems 
Facebook 272
Flickr 272
Google Plus 271–272
Instagram 272
Twitter 272
tracking users, via tweets 269, 270f
Social recommender systems 82–83
Software stack 360, 361f
Sparse matrix-vector multiplications (SpMVs) 125
AS-BFS 125–127
Spectral clustering (SC) 315
Speculative execution mechanism 219
SPOF  See Single points of failure (SPOF)
SSE scheme  See Searchable symmetric encryption (SSE) scheme
Stanford Rapide project 44
Statistical analysis 53
Statistical data analysis tools 53
Statistical methods, text mining 73–76
Statistics domain 13
Storage 
modeling methodology 258–259
parallel threads in 245
parameter spaces for 244–245, 245f
performance 245–248
Storage resource management (SRM) 163
Stored data analytics platforms 41
Stored data processing platforms 41
Storm 47, 164
Stream 44
Structured Query Language (SQL) 140
Support vector machines (SVMs) 426–427
tweets sentiment analysis 365–366, 366–367f

T

Text mining 
recommender systems 
datasets 83
evaluation metrics for 84
NLP in 85
ranking accuracy 85
social recommender systems 82–83
types 81–82
sentiment analysis 72–73
domain adaptation 76–77
Lexicon-based approach 73
rule-based approaches 73
statistical methods 73–76
trending topics 
detection system 79
document pivot method 77
extract n-grams 79–80
jaccard similarity 80
online clustering 79
ranking clusters 80–81
on Twitter 78f
Text watermarking 295–296
Thread, CUDA 107
Tietjen-Moore test 87
Tiled MapReduce method 240
Time series analysis, weather/Twitter sentiment analysis 372, 378
Transfer error rate 76
Transformation-based optimizations framework (TOF) 447–449, 448f
Translation, NER 71
Trapdoor algorithm 290
Trending topics, text mining 
detection system 79
document pivot method 77–78
extract n-grams 79–80
feature pivot method 77–78
jaccard similarity 80
online clustering 79
ranking clusters 80–81
on Twitter 78f
Trust- and influence-based links 83
Two-phase commit 142–143

V

Validation procedure, DOM 352–353
Video-on-demand (VoD) 389–390, 398
adaptive video caching framework 396
categories 392
cellular network 392–393
core and edge components 397–398f, 400
data generation 399
INCA caching algorithm 401–402
iProxy 395
Markov processes 393–394
QoE estimation 403
synthetic dataset 409–412
theoretical framework 403–404
wireless request processing 393f
Virtual machines (VMs) 161–162
VoD  See Video-on-demand (VoD)
Voltage deviation 422f
definition 419
normal operation 419, 420–421f
32 Vs of Big Data 13, 14–15f
3Vs of Big Data (Gartner) 7–8, 9f
4Vs of Big Data (IBM) 8, 9f
6Vs of Big Data (Microsoft) 8–9, 9f

W

WaaS  See Workflow-as-a- service (WaaS)
WAMS  See Wide area measurement system (WAMS)
Warp, CUDA 107
Watermarking 
digital 295–296
text 295–296
Wide area measurement system (WAMS) 427
Wireless network analytics, applications of 390f
Wireless service providers (WSPs) 395
WLog program 449–450, 450t
WMSes  See Workflow management systems (WMSes)
Workflow-as-a- service (WaaS) 445–446
Workflow in IaaS clouds 
complex structures 443
declarative optimization engine 449–451
diverse cloud offerings 442
monetary cost optimizations 445–447
resource provisioning 442
transformation-based optimizations framework 447–449, 448f
Workflow management systems (WMSes) 439, 449
WSPs  See Wireless service providers (WSPs)

X

XDOM (eXtension of DOM) 339–340
AskDOM 350
clustering-based summarization framework 344–348, 345f
data sources 340–341, 341f
implementation 350–351
influencer analysis 349
MapReduce framework 342–343
sentiment analysis 343–344, 345f
system architecture 341–342
validation procedure 352–353

Y

Yet Another Resource Negotiator (YARN) 46, 163–164, 183, 205

Z

ZooKeeper 46, 229–230