Postgres database instability on VPS

Viewed 22

I have a postgresql 14 database on a vps (Ubuntu 22) and I constantly run into the problem that it sometimes just stops working

then I have to run the following command sudo systemctl restart postgresql.service
and then my db starts working again

now recently it happened again

but after running sudo systemctl restart postgresql.service
all my tables just dissappeared

anyone an idea what could cause smth like this?
Image
Image
This is the journalctl
Image
It crashed while doing queries shown in the following code
pol_scraper = PolygonScraper()

conn = pg.connect(
    "dbname=postgres user=postgres password=mypw host=myhost port=5432"
)
cur = conn.cursor()
max_block = pol_scraper.web3.eth.block_number
# max_block = 57063842
for i, ((block, transaction_hash), transfer) in tqdm(enumerate(pol_scraper.get_all_data(0, max_block))):
    # print(block)
    # print(transfer['value'])
    try:
        cur.execute(
            """
            INSERT INTO polymarket_transfers (type, token_address, sender, receiver, token_id, transfer_event, value, transaction_hash, block_number)
            VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) ON CONFLICT DO NOTHING""",
            (
                transfer['type'],
                transfer['token_address'],
                transfer['from'],
                transfer['to'],
                transfer['token_id'],
                transfer['transfer_event'],
                transfer['value'],
                transaction_hash,
                block
            ),
        )
        conn.commit()
        if i % 5000 == 0:
            print(f"Last trade: {transaction_hash}, {block}")

    except Exception as e:
        print(e)
        print(transaction_hash)
        print(transfer)![Image](https://media.discordapp.net/attachments/291284109232308226/1243479298929852498/image.png?ex=66524897&is=6650f717&hm=341b11c7efc02145fe9ae436d3370a2fca315743e430eb90fffac3f3d3c583e7&=&format=webp&quality=lossless&width=800&height=222)

Image

Image

2 Answers

From your question, it appears that you are experiencing frequent crashes of your PostgreSQL 14 database on a VPS running Ubuntu 22.04, and the issue seems to manifest particularly when executing queries. After restarting the PostgreSQL service, your tables unexpectedly disappear, which is a severe issue.

Let's break down possible causes and solutions:

Potential Causes

  1. Resource Limitations:

    • Memory: PostgreSQL might be crashing due to insufficient memory on your VPS. During large operations, like the one in your script, the database may consume a lot of memory, leading to crashes.
    • CPU: Intensive operations can also cause high CPU usage, possibly leading to instability.
  2. Disk Space:

    • Ensure you have enough disk space available. Lack of disk space can cause severe issues, such as inability to write data, leading to crashes.
  3. Configuration Issues:

    • PostgreSQL might not be configured correctly for your workload and the server resources. Parameters such as work_mem, shared_buffers, and maintenance_work_mem might need tuning.
  4. Corruption:

    • Sudden stops and restarts can sometimes cause data corruption, which might explain the disappearance of tables.
  5. File System Issues:

    • The underlying file system could have issues, causing instability in PostgreSQL.
  6. Transaction Handling:

    • Improper handling of transactions in your script might leave the database in an inconsistent state, especially with frequent commands like COMMIT.

Steps to Troubleshoot and Resolve

  1. Check System Resources:

    • Verify available memory: free -m
    • Check CPU usage: top or htop
    • Check disk space: df -h

    Ensure your VPS has sufficient resources. Consider upgrading if necessary.

  2. PostgreSQL Logs:

    • Check PostgreSQL logs in /var/log/postgresql/ for error messages that occur before the crash. This might provide insights into what is going wrong.
  3. System Logs:

    • journalctl -xe can show you system-wide errors related to PostgreSQL.
  4. Adjust PostgreSQL Configuration:

    • Modify postgresql.conf and tune the settings:
      shared_buffers = 25% of available RAM
      work_mem = 2-4MB per connection, or more depending on your queries
      maintenance_work_mem = 64MB or more
      max_connections = a number that matches your workload but doesn’t overcommit resources
      
    • After making changes, restart PostgreSQL: sudo systemctl restart postgresql
  5. Transaction Management in Scripts:

    • Ensure your Python script is managing transactions properly:
      try:
          # your db operations
          conn.commit()
      except Exception as e:
          conn.rollback()
          print(e)
      

    This ensures that in case of an error, changes do not leave the database in an inconsistent state.

  6. Backup and Recovery:

    • Always do regular backups of your database to prevent data loss. Use pg_dump or similar tools.
  7. Recreate Missing Tables:

    • If tables are missing, ensure your last backup is restored correctly. Investigate why they disappeared to prevent recurrence.

Example: Optimized Script Handling

Below is an optimized version of your script, focusing on proper exception handling and batching:

import psycopg2 as pg
from tqdm import tqdm

pol_scraper = PolygonScraper()

conn = pg.connect(
    "dbname=postgres user=postgres password=mypw host=myhost port=5432"
)

cur = conn.cursor()
max_block = pol_scraper.web3.eth.block_number

# Batch insert to reduce memory pressure
batch_size = 5000
batch = []

for i, ((block, transaction_hash), transfer) in tqdm(enumerate(pol_scraper.get_all_data(0, max_block))):
    batch.append(
        (
            transfer['type'], transfer['token_address'], transfer['from'],
            transfer['to'], transfer['token_id'], transfer['transfer_event'],
            transfer['value'], transaction_hash, block
        )
    )

    if (i+1) % batch_size == 0 or (i+1) == max_block:
        try:
            cur.executemany(
                """
                INSERT INTO polymarket_transfers (type, token_address, sender, receiver, token_id, transfer_event, value, transaction_hash, block_number)
                VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s) ON CONFLICT DO NOTHING""",
                batch
            )
            conn.commit()
            batch = []
            print(f"Processed up to block: {block}")

        except Exception as e:
            conn.rollback()
            print(e)
            print(transaction_hash)
            print(transfer)

conn.close()

This script collects data in batches before inserting to limit memory usage and reduce transaction load on the database, potentially avoiding crashes due to resource exhaustion.

Final Steps

  • Monitor your PostgreSQL service closely after implementing these changes.
  • Consider enabling logging for all statements (log_statement = 'all') in postgresql.conf to get detailed logs of what queries are being run.
  • Regularly monitor system metrics and PostgreSQL logs to prevent future issues.

If crashes continue, consider involving your VPS provider or a database expert for a more in-depth analysis.

This question is about a PostgreSQL 14 database on an Ubuntu 22 VPS (Virtual Private Server) that suddenly stopped working and disappeared all tables after restarting the PostgreSQL service. The user experienced this issue while querying the database using Python code.

Troubleshooting Steps

To troubleshoot this issue, we need to examine the system logs and PostgreSQL logs.

  1. system logs: You provided journalctl output, which shows that the PostgreSQL service has been restarted. There might be more information in the systemctl logs, you can try to restart the service with sudo systemctl restart postgresql.service and check the systemctl logs again.
journalctl -u postgresql.service
  1. PostgreSQL Logs: It's recommended to store the PostgreSQL logs outside of the PostgreSQL data directory. You can try setting log_destination to syslog in postgresql.conf (the main configuration file for PostgreSQL) to send logs to /var/log/postgresql/postgresql.log and restart the service.

  2. Table Corruption: PostgreSQL uses a write-ahead logging mechanism to ensure data consistency. It is possible that the table was not properly committed before the restart.

You can try to restore the database using backups or pg_restore to see if it resolves the issue.

  1. Database Consistency: There might be inconsistencies within the database. We can try to identify the last transaction that was completed before the restart, and restore it to ensure that the database is self-consistent.

  2. pg_stat_activity to identify the transaction: If you can identify the last transaction that was completed before the restart, you can restore it from the pg_stat_activity view.

SELECT * FROM pg_stat_activity WHERE query LIKE '%INSERT INTO polymarket_transfers%';
  1. pg_rewind: If the backup is not possible, we can try to use pg_rewind to rewind the journal and continue from the last known good state. This is a manual process and requires careful understanding of PostgreSQL internals.

  2. Recovery from failures: Recovery mechanisms such as pg_rewind can be used to restore the database from a failure.

  3. Error Handling: Your code does not handle the errors properly. When an exception is raised, it should handle and log it properly.

    try:
        # your code
except Exception as e:
    print("An error occurred: ", str(e))
    # handle and log errors

Remember that PostgreSQL has robust failover and recovery mechanisms to minimize data loss. You can also configure archive_timeout in postgresql.conf to periodically archive the WAL (Write-Ahead Log), which can help to minimize data loss in case of a failure.

Remember that PostgreSQL database can recover from most of the failures.