Overview of Data Types
Data is a foundation for understanding complex systems, making informed decisions, and driving innovation. Recognizing the type of data you’re working with is crucial. It determines how you can summarize it, visualize it, and extract meaningful insights. By thinking carefully about data types, we can avoid misleading conclusions, choose the right analytical tools, and unlock the full potential of the information at hand.
Tabular
Tabular data is structured in rows and columns, much like a spreadsheet or a database table. Each row typically represents an observation, and each column represents a variable or feature. This format is one of the most common in data analysis and is used in tasks like regression, classification, and exploratory data analysis. Examples include survey results, financial records, and experiment logs.
Text
Text data includes any data in written language, ranging from short phrases to full documents. It can come from diverse sources such as interviews, survey responses, articles, or social media posts. In some cases, text may need to be extracted from other formats, such as PDFs, with more information on data wrangling available in the section on PDFs. Analyzing text often involves preprocessing steps like tokenization, and tasks may include sentiment analysis, topic modeling, or text classification, for which more detail can be found on the NLP page.
Visual
Visual data includes images, videos, and other forms of visual representations that convey information through patterns, shapes, or spatial structure. This can encompass a wide range of content such as medical imagery, satellite views, or rendered scenes. Analyzing visual data often involves tasks like classification, segmentation, or object detection, using approaches from computer vision and image analysis.
# visual # paper # Common Limitations of Image Processing Metrics: A Picture Story # 2023 # paper
Time Series
Time series data consists of observations recorded at regular (or irregular) time intervals. It captures how variables evolve over time, making it particularly useful in domains like finance, health monitoring, environmental science, or sensor data. Key methods include forecasting, anomaly detection, and trend analysis.
Network Data
Network data represents relationships or interactions between entities, often visualized as graphs with nodes (entities) and edges (connections). This type of data is prevalent in social networks, transportation systems, biological networks, and communication networks. Analyzing network data involves techniques like centrality measures, network visualization and modeling of the network.
- A Connected World: Data Analysis for Real-World Network Data - GitHub Repository
- A Connected World: Data Analysis for Real-World Network Data - BERD Academy Information & Registration Page
# network # BERD Academy module # A Connected World: Data Analysis for Real-World Network Data # 2023 # slides # R script
Mobile Phone Data
Mobile phone data encompasses information collected from mobile devices, including call records, text messages, app usage, and location data. This type of data is often used in studies related to human behavior, social networks, and mobility patterns. Analyzing mobile phone data requires careful consideration of privacy and ethical issues.
- Data Challenge: Mobile Phone Data - GitHub Repository
- Data Challenge: Mobile Phone Data - BERD Academy Information & Registration Page
# mobile # BERD Academy module # Data Challenge: Mobile Phone Data # 2024 # slides
Once we recognize the type of data we’re working with, the next step is data wrangling — preparing and transforming data so it’s ready for analysis.