Data Science Research Areas
Data Science Research Areas
Research Areas:
Big Data
‘Big data refers to data that would typically be too expensive to store, manage, and analyze using traditional (relational and/or monolithic) database systems. Usually, such systems are cost-inefficient because of their inflexibility for storing unstructured data (such as images, text, and video), accommodating “high-velocity” (real-time) data, or scaling to support very large (petabyte-scale) data volumes.’
‘Data analytics only returns more value when you have access to more data, so organizations across multiple industries have found big data to be a rich resource for uncovering profound business insights. And, because machine-learning models get more efficient as they are “trained” with more data, machine learning and big data are highly complementary.’ https://cloud.google.com/what-is-big-data
four challenges have to be considered:
Volume (the sheer amount of data — the year 2025 will feature eight times more data than in 2017[2])
Velocity (the speed with which data is generated and processed — e.g. streaming, IOT, social media)
Variety (structured and increasingly unstructured data)
Veracity (lack of data quality and missing know-how for evaluation)
Typical characteristics of Big Data (Storage) Technologies are:
Distributed Storage
Data Replication
Local Data Processing
High Availability
Data Partitioning
Denormalized Data
Working with Structured and unstructured data
Big Data: when the data amount reaches terabytes/petabytes or
traditional systems are no longer powerful enough and are also significantly more expensive when working with this kind of data amounts.
Even for analytics:
if there is any single guarantee, it’s that your data will grow over time--probably, exponentially.
For ML products:
Another point is that the emerging field of deep/machine learning becomes more and more efficient by training with more data. Therefore, the area is a perfect addition to Big Data
https://towardsdatascience.com/what-big-data-actually-means-d4b00e8ae00