Overview of Data Types

Data is a foundation for understanding complex systems, making informed decisions, and driving innovation. Recognizing the type of data you’re working with is crucial. It determines how you can summarize it, visualize it, and extract meaningful insights. By thinking carefully about data types, we can avoid misleading conclusions, choose the right analytical tools, and unlock the full potential of the information at hand.

Tabular

Tabular data is structured in rows and columns, much like a spreadsheet or a database table. Each row typically represents an observation, and each column represents a variable or feature. This format is one of the most common in data analysis and is used in tasks like regression, classification, and exploratory data analysis. Examples include survey results, financial records, and experiment logs.

Text

Text data includes any data in written language, ranging from short phrases to full documents. It can come from diverse sources such as interviews, survey responses, articles, or social media posts. In some cases, text may need to be extracted from other formats, such as PDFs, with more information on data wrangling available in the section on PDFs. Analyzing text often involves preprocessing steps like tokenization, and tasks may include sentiment analysis, topic modeling, or text classification, for which more detail can be found on the NLP page.

Visual

Visual data includes images, videos, and other forms of visual representations that convey information through patterns, shapes, or spatial structure. This can encompass a wide range of content such as medical imagery, satellite views, or rendered scenes. Analyzing visual data often involves tasks like classification, segmentation, or object detection, using approaches from computer vision and image analysis.

Further Keywords

image-level classification, semantic segmentation, object detection, instance segmentation, pitfalls, counting metrics, multi-threshold metrics, distance-based metrics, MCC, Cohen’s Kappa, sensitivity, specificity, AUROC, positive predictive value, DSC, IoU, Hausdorff distance, calibration metrics, FROC, mismatch, high class imbalance

# visual # paper # Common Limitations of Image Processing Metrics: A Picture Story # 2023 # paper

Time Series

Time series data consists of observations recorded at regular (or irregular) time intervals. It captures how variables evolve over time, making it particularly useful in domains like finance, health monitoring, environmental science, or sensor data. Key methods include forecasting, anomaly detection, and trend analysis.

Network Data

Network data represents relationships or interactions between entities, often visualized as graphs with nodes (entities) and edges (connections). This type of data is prevalent in social networks, transportation systems, biological networks, and communication networks. Analyzing network data involves techniques like centrality measures, network visualization and modeling of the network.

Further Keywords

visualization of networks, Fruchterman-Reingold algorithm, graph density, node degree, degree centrality, modelling networks, exponential random graph models (ERGM), latent variable models (LVM)

# network # BERD Academy module # A Connected World: Data Analysis for Real-World Network Data # 2023 # slides # R script

Mobile Phone Data

Mobile phone data encompasses information collected from mobile devices, including call records, text messages, app usage, and location data. This type of data is often used in studies related to human behavior, social networks, and mobility patterns. Analyzing mobile phone data requires careful consideration of privacy and ethical issues.

Further Keywords

mobile network data, official statistics, gravity models, anonymization, commuting patterns, public transport planning, legal framework, infrastructure quality, welfare effects

# mobile # BERD Academy module # Data Challenge: Mobile Phone Data # 2024 # slides

Once we recognize the type of data we’re working with, the next step is data wrangling — preparing and transforming data so it’s ready for analysis.