The definitions include:
- Gartner Group: The “Four V’s” definition: volume, velocity, variety, veracity
- Oracle: The derivation of value from traditional relational database-driven business decision-making, augmented with new sources of unstructured data such as blogs, social media, sensor networks, and image data.
- Intel: Generating a median of 300 terabytes of data weekly. Includes business transactions stored in relational databases, documents, e-mail, sensor data, blogs and social media
- Microsoft: The process of applying serious computing power, the latest in machine learning and artificial intelligence, to seriously massive and often highly complex sets of information.
- The application definition (arrived at by analyzing the Google Trends results for “big data”): Large volumes of unstructured and/or highly variable data that require the use of several different analysis tools and methods, including text mining, natural language processing, statistical programming, machine learning, and information visualization.
- The Method for an Integrated Knowledge Environment (MIKE2.0) definition: A high degree of permutation and interaction within a dataset, rather than the size of the dataset. “Big Data can be very small, and not all large datasets are Big.”
- NIST: Data that exceeds the capacity or capability of current or conventional [analytic] methods and systems.