Introduction to Software Systems
  • About
  • Introduction
  • Software Engineering
    • Software System
    • Software Product Development
    • Computer Networks
  • Terminal Programming
    • BASH - Basic Commands
    • BASH - Conditions and Loops
    • Worked-out Examples
    • Practice Questions
  • Databases
    • Structured Databases
      • SQL Queries
      • Worked-out Example
    • Unstructured Databases
      • NoSQL Queries
      • Worked-out Example
  • Object Oriented Programming
    • Python - Introduction
    • Python - Basic Concepts
    • Python - Inbuilt Datastructures
    • Python - Conditions and Loops
    • Python - Lambda, Functions, Class/Objects
    • Worked-out Examples
  • WEB TECHNOLOGIES
    • HTML
    • CSS
    • Native JavaScript - Basics
    • Native JavaScript - Conditional Statements and Loops
    • Native JavaScript - Data Structures
    • JavaScript - Scope, Functions, Type Conversion
Powered by GitBook
On this page
  1. Databases

Unstructured Databases

This section presents concepts associated with Unstructured Databases

PreviousWorked-out ExampleNextNoSQL Queries

Last updated 3 months ago

Unstructured databases are designed to store and manage unstructured data, which is information that lacks a predefined format or structure. Unlike structured data, which is organized in tables with rows and columns, unstructured data can come in various forms, including text documents, images, videos, emails, social media posts, and more. This type of data is prevalent in modern organizations, with estimates suggesting that 80% to 90% of data generated today is unstructured.

Characteristics of Unstructured Data

  • Lack of Fixed Format: Unstructured data does not fit neatly into traditional database schemas, making it challenging to categorize and analyze.

  • Variety of Formats: It encompasses a wide range of formats such as:

    • Text documents (e.g., reports, emails)

    • Multimedia files (e.g., images, audio, video)

    • Social media content (e.g., posts, comments)

    • Web pages and blogs

  • Volume: Unstructured data typically represents a larger volume compared to structured data, contributing to the challenges of storage and analysis

Storage Solutions for Unstructured Data

Unstructured databases utilize various technologies to manage this type of data effectively:

  • NoSQL Databases: These databases (e.g., MongoDB) are designed to handle unstructured data without the constraints of a fixed schema. They allow for flexible data storage and retrieval.

  • Data Lakes: These are centralized repositories that store vast amounts of raw unstructured data in its native format until needed for analysis.

  • Data Warehouses: While traditionally used for structured data, some modern warehouses can accommodate unstructured data as well.

Challenges in Managing Unstructured Data

  1. Storage Complexity: Storing unstructured data requires significant space and can be costly compared to structured data storage solutions.

  2. Difficulty in Analysis: Traditional analytics tools are often ineffective for unstructured sources due to their lack of predefined attributes and structures.

  3. Indexing Issues: Searching and indexing unstructured data can be error-prone and less accurate due to its ambiguous nature

NoSQL databases are specifically designed to manage unstructured data by utilizing various flexible data models that accommodate the diverse formats and structures of this type of information. Here’s how they achieve this:

Key Approaches in NoSQL Databases

  1. Data Models:

    • Key-Value Stores: These databases manage data as a collection of key-value pairs, where each key points to a unique data item. This model is straightforward and allows for the storage of any type of data, including text, images, and videos

    • Document Stores: Data is stored in documents (often in formats like JSON or BSON), which can contain varying fields and structures. This flexibility enables easy accommodation of changing data requirements without the need for a fixed schema

    • Graph Databases: These databases excel at managing relationships between data points, making them ideal for unstructured data that involves complex interconnections, such as social media interactions

    • Wide-Column Stores: Similar to relational databases but with more flexibility, these databases allow different rows to have varying columns, facilitating the storage of diverse unstructured data types

  2. Scalability:

    • NoSQL databases are designed to scale horizontally by distributing data across multiple servers or clusters. This approach enhances performance and allows for the efficient handling of large volumes of unstructured data

  3. Schema Flexibility:

    • Unlike traditional relational databases that require a predefined schema, NoSQL databases can adapt to changes in data structure on-the-fly. This capability is particularly beneficial for applications that need to evolve rapidly or handle unpredictable data formats

  4. Performance Optimization:

    • Many NoSQL solutions prioritize low latency and high availability by relaxing some consistency requirements typical in relational databases. This design choice allows for faster read and write operations, which is crucial when dealing with large datasets

  5. Data Handling:

    • NoSQL databases manage unstructured data as it is presented without the need for extensive transformation processes, which can be cumbersome in relational systems. This direct handling reduces complexity and improves efficiency in data processing

In summary, NoSQL databases provide an effective solution for managing unstructured data through their flexible architectures, scalability options, and ability to handle diverse data types without rigid schemas. This makes them particularly suitable for modern applications that require agility and speed in processing large amounts of unstructured information.

MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. Instead of storing your data in tables and rows as you would with a relational database, in MongoDB you store JSON-like documents with dynamic schemas(schema-free, schema less).

MongoDB does not need any pre-defined data schema. Every document could have different data!

2
5
1
3
3
5
2
4
Comparision of SQL database Types and NoSQL database Types
Comparison of SQL table schema definition Vs NoSQL Schema Definition
Different types of documents that can reside in same Collection in Document Databases