NoSQL document stores are great for managing large amounts of unstructured data. However, some organizations may work with unstructured data but still need the functionality that comes with a traditional SQL database. For example, a media or news content agency may operate a high-traffic website centered on large amounts of text and image content. You need to store this unstructured data, but you probably don’t really need the flexible schema and horizontal scalability that NoSQL databases provide. Instead, they want the ease of database management and consistency that comes with relational databases like PostgreSQL.
Is it possible to get the best of both worlds? Yes.
With data types intended to support unstructured data, PostgreSQL provides a convenient medium for leveraging NoSQL capabilities within a cost-effective and easy-to-manage relational database. This article shows how to work with unstructured data using HStore and JSONB data types in PostgreSQL.
Before we dive in, let’s take a quick look at the main differences between SQL and NoSQL databases.
Understanding SQL and NoSQL
SQL and NoSQL databases each have their own strengths and weaknesses. Being able to make an informed decision about which one best suits your data needs depends on a good understanding of the differences between the two.
SQL (relational) databaseis similar to PostgreSQL and MySQL, Data with a clear and predictable structure By table, row and column.they abide by Properties of ACID (Atomicity, Consistency, Isolation, and Durability) create a strong foundation for data integrity by ensuring that database transactions are processed reliably.
SQL databases excel when data consistency and integrity are critical, such as when dealing with complex queries or transactional systems (such as financial applications).
in contrast, NoSQL database (document store) respond to Large and diverse datasets Not necessarily suitable for tabular representation. Examples of NoSQL databases include MongoDB, Cassandra, and Couchbase. NoSQL databases work together flexible schema, allowing the data structure to evolve over time.they also support Horizontal scalabilitydistributes data across multiple servers to better handle large data loads and high traffic.
NoSQL databases are often used in applications where scalability is critical, such as real-time applications and large-scale language model (LLM) processing of large amounts of data. NoSQL databases are also beneficial when dealing with diverse and evolving data structures, as they allow organizations to adapt to changing data needs.
Why use PostgreSQL as a document store?
PostgreSQL is a relational database, so it might seem unconventional to think of it as an option for NoSQL needs. However, in your situation, using PostgreSQL as your document store may be appropriate.
You have diverse data storage needs and need structured, ACID-compliant data storage and Flexible, Schemaless Document Storage – Powered by PostgreSQL, Combining relational and non-relational models.Or perhaps you want certainty You want NoSQL functionality, but you also want data consistency Guaranteed to come with ACID properties. lastly, Mature technology with a vibrant community, PostgreSQL provides comprehensive SQL support, advanced indexing, and full-text search. Combining these features with NoSQL capabilities makes PostgreSQL a versatile data storage solution.
Limitations when using PostgreSQL for NoSQL-style data
Despite its versatility, PostgreSQL has certain limitations compared to traditional NoSQL databases. While PostgreSQL can scale up, Vertical directionthat Inherently distributed data with horizontal scaling or auto-sharding is not supported., a feature that NoSQL databases typically provide. PostgreSQL too No optimizations provided for specific NoSQL data structures Such as wide column stores and graph databases. Finally, PostgreSQL Does not provide tunable consistency to optimize performancewhich may be obtained from some NoSQL databases.
When considering using PostgreSQL for large unstructured data sets, be aware that these limitations can impact performance and scalability. Additionally, mixing SQL and NoSQL data manipulation introduces complexity. Careful planning and understanding of both paradigms can help you avoid potential pitfalls.
However, with the right understanding and use cases, PostgreSQL can serve as a powerful tool that offers the best of both SQL and NoSQL.
HStore and JSONB in PostgreSQL
Looking at the possibilities of using PostgreSQL as a NoSQL solution, I found three data types that offer NoSQL-like functionality, but each with its own characteristics and use cases.
- H store: This data type allows you to store key-value pairs in a single PostgreSQL value. This is useful for storing semi-structured data that doesn’t have a fixed schema.
- JSONB: This is a JSON-like binary representation of the data. It can store more complex structures compared to HStore and supports full JSON functionality. JSONB is indexable and therefore suitable for large amounts of data.
- JSON: This is similar to JSONB, but lacks many of JSONB’s features and efficiencies. The JSON data type stores an exact copy of the input text, including blanks and duplicate keys.
I mentioned the JSON data type as a valid choice for storing JSON-formatted data when you don’t need all the features that JSONB provides. However, the rest of this article will focus primarily on HStore and JSONB.
H store
The PostgreSQL documentation describes HStore as useful when you have “rows with many attributes that are rarely inspected, or semi-structured data”. Always enable the HStore extension before working with HStore data types.
> CREATE EXTENSION hstore;
An HStore is represented as zero or more keys => values separated by commas. The order of the pairs is not important and is guaranteed to be preserved on output.
> SELECT 'foo => bar, prompt => "hello world", pi => 3.14'::hstore;
hstore
-----------------------------------------------------
"pi"=>"3.14", "foo"=>"bar", "prompt"=>"hello world"
(1 row)
Each HStore key is unique. If an HStore declaration is made with duplicate keys, only one of the duplicate keys will be stored and there is no guarantee which will be the duplicate key.
> SELECT 'key => value1, key => value2'::hstore;
hstore
-----------------
"key"=>"value1"
(1 row)
HStore offers simplicity and fast queries due to its flat key-value structure, making it ideal for simple scenarios. However, HStore only supports text data and does not support nested data, so it is limited to complex data structures.
JSONB, on the other hand, can handle a wider variety of data types.
JSONB
The JSONB data type accepts input text in JSON format and stores it in decomposed binary format. This conversion slows down the input slightly, but results in faster processing and efficient indexing. JSONB does not preserve whitespace or order of object keys.
> SELECT '{"foo": "bar", "pi": 3.14, "nested": { "prompt": "hello", "count": 5 } }'::jsonb;
jsonb
-----------------------------------------------------------------------
{"pi": 3.14, "foo": "bar", "nested": {"count": 5, "prompt": "hello"}}
(1 row)
If a duplicate object key is specified, last Value is preserved.
> SELECT '{"key": "value1", "key": "value2"}'::jsonb;
jsonb
-------------------
{"key": "value2"}
(1 row)
JSONB supports complex structures and full JSON functionality, making it an ideal choice for complex or nested data and recommended over HStore and JSON. However, using JSONB incurs performance overhead and increases storage usage compared to HStore.
Practical Example: Working with HStore and JSONB
Let’s consider some real world examples of how to work with these data types. We’ll look at creating tables, basic queries and operations, and indexing.
Basic HStore operations
Like any other data type, you can define fields in PostgreSQL data tables as HStore data types.
> CREATE TABLE articles ( id serial primary key, title varchar(64), meta hstore );
Inserting a record with the HStore attribute looks like this:
> INSERT INTO articles (title, meta)
VALUES (
'Data Types in PostgreSQL',
'format => blog, length => 1350, language => English, license => "Creative Commons"');
> SELECT * FROM articles;
id | title | meta ----+--------------------------+------------------------------------------ 1 | Data Types in PostgreSQL | "format"=>"blog", "length"=>"1350", "license"=>"Creative Commons", "language"=>"English"(1 row)
HStore fields allow you to fetch specific key-value pairs from the field specified by the specified key.
> SELECT title, meta -> 'license' AS license, meta -> 'format' AS format FROM articles;
title | license | format
---------------------------------+------------------+------------
Data Types in PostgreSQL | Creative Commons | blog
Advanced Querying in PostgreSQL | None | blog
Scaling PostgreSQL | MIT | blog
PostgreSQL Fundamentals | Creative Commons | whitepaper
(4 rows)
You can also query with conditions based on specific values in HStore fields.
> SELECT id, title FROM articles WHERE meta -> 'license' = 'Creative Commons';
id | title
----+--------------------------
1 | Data Types in PostgreSQL
4 | PostgreSQL Fundamentals
(2 rows)
Sometimes I want to query only rows that contain a certain key in an HStore field. For example, the following query will only return rows where the meta HStore contains the note key: Use ? to do this. operator.
> SELECT title, meta->'note' AS note FROM articles WHERE meta ? 'note';
title | note
---------------------------------+-----------------
PostgreSQL Fundamentals | hold for review
Advanced Querying in PostgreSQL | needs edit
(2 rows)
You’ll find a list of useful HStore operators and functions. here. For example, you can extract the HStore’s keys into an array, or convert the HStore to his JSON representation.
> SELECT title, akeys(meta) FROM articles where id=1;
title | akeys
--------------------------+----------------------------------
Data Types in PostgreSQL | {format,length,license,language}
(1 row)
> SELECT title, hstore_to_json(meta) FROM articles where id=1;
title | hstore_to_json
--------------------------+------------------------------------------------
Data Types in PostgreSQL | {"format": "blog", "length": "1350", "license": "Creative Commons", "language": "English"}
(1 row)
Basic JSONB operations
Working with the JSONB data type in PostgreSQL is straightforward. Creating a table and inserting records looks like this:
> CREATE TABLE authors (id serial primary key, name varchar(64), meta jsonb);
> INSERT INTO authors (name, meta) VALUES ('Adam Anderson', '{ "active":true, "expertise": ["databases", "data science"], "country": "UK" }');
Note that the jsonb meta field is provided as a JSON-formatted text string. PostgreSQL will throw an error if the value you provide is not valid JSON.
> INSERT INTO authors (name, meta) VALUES ('Barbara Brandini', '{ "this is not valid JSON" }');
ERROR: invalid input syntax for type json
Unlike the HStore type, JSONB supports nested data.
> INSERT INTO authors (name, meta) VALUES ('Barbara Brandini', '{ "active":true, "expertise": ["AI/ML"], "country": "CAN", "contact": { "email": "barbara@example.com", "phone": "111-222-3333" } }');
As with HStore, JSONB fields can be partially retrieved using only specific keys. for example:
> SELECT name, meta -> 'country' AS country FROM authors;
name | country ------------------+--------- Adam Anderson | "UK" Barbara Brandini | "CAN" Charles Cooper | "UK"(3 rows)
The JSONB data type has many data types. operator Usage is similar to HStore. For example, here’s the usage of ? The operator retrieves only rows where the meta field contains the contact key.
> SELECT name, meta -> 'active' AS active, meta -> 'contact' AS contact FROM authors WHERE meta ? 'contact';
name | active | contact
------------------+--------+-----------------------------------------------
Barbara Brandini | true | {"email": "barbara@example.com", "phone": "111-222-3333"}
Charles Cooper | false | {"email": "charles@example.com"}
(2 rows)
Working with indexes
according to documentation, the HStore data type “has GiST and GIN index support for @>, ?, ?&, and ?|” operators. For more information on the differences between the two types of indexes, see: here. JSONB indexing Use GIN indexes to facilitate efficient lookup of keys or key-value pairs.
The statement to create the index looks like this, as expected:
> CREATE INDEX idx_hstore ON articles USING GIN(meta);
> CREATE INDEX idx_jsonb ON authors USING GIN(meta);
SQL constructs with NoSQL flexibility
Let’s look again at the original use case mentioned at the beginning. Imagine a news content agency that stores articles in much the same way a NoSQL document store would. Perhaps an article could be represented in JSON as an ordered array of objects representing sections, each containing text content, annotations, and formatting. Additionally, a lot of metadata is associated with each article, and those metadata attributes are inconsistent across articles.
While the above description summarizes most of an organization’s NoSQL needs, everything else about how data is managed and organized is closely aligned with the relational data model.
By combining the NoSQL capabilities of data types such as JSONB with the strengths of PostgreSQL’s traditional SQL, organizations can take advantage of flexible schemas and fast queries on nested data, while performing collaborative operations to explore data relationships. can be forced. PostgreSQL’s HStore and JSONB data types provide powerful options for developers who need not only the structure of a relational database, but also NoSQL-style data storage.
PostgreSQL at scale
Want to support NoSQL-style data storage and querying while staying within the framework of traditional relational databases? Chances are your organization handles documents the same way I described in this post. prize. Or maybe you’re looking for options to handle storage of unstructured data for Large Language Models (LLM) or other AI/ML undertakings.
PostgreSQL Clusters on the Linode Marketplace offer the relational model and structure of a SQL database with the horizontal scalability of a NoSQL database. Combining this with the use of HStore or JSONB data types gives him an ideal hybrid solution for leveraging NoSQL capabilities when working within PostgreSQL.