Apache Answer call out whenever a new question is asked

Viewed 77

I have the open source “Apache answer” project deployed. Now, whenever a new question is asked I want to make an api call to OpenAI. What are some approaches? I like ease of development right now since this is an mvp.

Answer doesn't seem to have any webhooks out of the box. Here's the API for my instance.

It does have some limited email notifications, but those seem slow. Querying every few minutes for new questions also seems slow. I want the user to have the feeling of their question getting answered in seconds, not minutes.

I have a working Function defined using DigitalOcean Functions that, when called with a question_id will POST an answer from ChatGPT-4o back to that question.

I tried migrating Ducky.foo over to using Postgres on Supabase thinking that I'd kill two birds with one stone-- a more scalable database and the ability to call my DO Function from a database trigger. I had a successful test environment with Answer running on Supabase Postgres but got stuck trying to setup the trigger.

From the Supabase docs, I got this trigger set up:

-- Create the trigger for table update
CREATE TRIGGER sample_trigger
    AFTER UPDATE ON test_table
    FOR EACH ROW
    EXECUTE FUNCTION sample_trigger_function();

and it's calling this function:

-- Create the function that will be called by the trigger
CREATE OR REPLACE FUNCTION sample_trigger_function()
    RETURNS trigger
    LANGUAGE plpgsql
AS $$
BEGIN
    -- Calls pg_net function net.http_post to send a request to the Postman API
    PERFORM "net"."http_post"(
        'https://postman-echo.com/post'::text,
        jsonb_build_object(
            'old_row', to_jsonb(OLD.*),
            'new_row', to_jsonb(NEW.*)
        ),
        headers:='{"Content-Type": "application/json"}'::jsonb
    ) AS request_id;
    RETURN NEW;
END $$;

It seems to work when I UPDATE a row in the table:

UPDATE test_table SET value = 706 WHERE name = 'Test';

returns UPDATE test_table SET value = 706 WHERE name = 'Test';

But I couldn't find Trigger/Function logs in Supabase to see the result of the run.

So then I tried updating the sample_trigger_function to call my DO Function:

CREATE OR REPLACE FUNCTION public.sample_trigger_function()
    RETURNS trigger
    LANGUAGE plpgsql
AS $$
BEGIN
    -- Calls pg_net function net.http_post to send a request to the specified API
    PERFORM public.net.http_post(
        'https://faas-nyc1-2ef2e6cc.doserverless.co/api/v1/namespaces/fn-4debf4e9-6121-41dc-a674-9ba721490ff8/actions/answer/test-pg-chatgpt-4o?blocking=true&result=true'::text,
        jsonb_build_object(
            'question_id', '10010000000000009'
        ),
        headers:='{
            "Content-Type": "application/json",
            "Authorization": "Basic <TOKEN>"
        }'::jsonb
    ) AS request_id;
    RETURN NEW;
END $$;

But now when I UPDATE a row in the database, I get this error:

ERROR:  0A000: cross-database references are not implemented: public.net.http_post
QUERY:  SELECT public.net.http_post(
        'https://faas-nyc1-2ef2e6cc.doserverless.co/api/v1/namespaces/fn-4debf4e9-6121-41dc-a674-9ba721490ff8/actions/answer/test-pg-chatgpt-4o?blocking=true&result=true'::text,
        jsonb_build_object(
            'question_id', '10010000000000009'
        ),
        headers:='{
            "Content-Type": "application/json",
            "Authorization": "Basic ZDU3NGM0ZTYtMzMzOS00NDJmLWE1NjUtOGI3ZDJlMmU1ZjdhOlB6MWs3Q0ZKdjREM2Y0WWZmd1JDYjBVR3p0R3F5bjJCbFpSMHpDcjhRdmlyZFB2eHlsUXdDZHloV3RRS2syS2s="
        }'::jsonb
    ) AS request_id
CONTEXT:  PL/pgSQL function sample_trigger_function() line 4 at PERFORM

I'm not sure whether to keep plowing ahead or to try a different path.

3 Answers

Hey Jake, I really don't think you want to do this with stored procedures. Stored procedures calling an external URL is of course possible (as you have seen w/ your attempts) but is really not a good idea at all. I might stop short of calling it an anti-pattern, but it's definitely a bad smell.

In general there is not a lot of case for using stored procedures at all in 2024, and if you do use them they should be very tiny, efficient, self-contained, specialized things that really are about the database operating as you want it to, not about external integrations.

It's unfortunate that Apache Answers doesn't have webhooks. That's really what you want here, and what it needs as a product too. If you're feeling ambitious and helpful, you should implement them, and open a PR! If you're feeling especially generous instead, you should pay a developer that's working on Apache Answers to build them.

If neither of those options seems especially interesting to you (since it's just an MVP) you should be taking advantage of Apache Answers being OSS and just go to the part of the code where it's done creating a new question and jam in your call to your Digital Ocean function.

Thanks, @sean-johnson! I know at some point I need to dive in to the Answer code base. I'm trying to push that off as long as possible. What do you think about the Supabase Realtime API approach?

I ended up working with the Golang code directly and implementing a minimal webhook on question creation into Apache Answer. I'll look to contribute it back to the Apache project.

Sure, let's break down the approaches and potential solutions for your problem.

1. Understanding the System Setup

From your description:

  • You want an API call to OpenAI whenever a new question is asked in the Apache Answer system.
  • You're using DigitalOcean Functions to handle the API call to OpenAI.
  • You've migrated your database to Supabase and are setting up database triggers to detect new questions and call your function.

2. Resolving Cross-Database Reference Issues

The error you're experiencing indicates a cross-database reference issue. Supabase triggers do not have direct support for invoking HTTP requests via pg_net when the reference is cross-database. This could be a restriction on your Postgres setup.

3. Alternative Approaches

A. Using Supabase Realtime API

Supabase provides a Realtime feature that can listen to changes in your database. You can subscribe to new inserts and, once detected, trigger your DigitalOcean Function.

Steps:

  1. Set up Realtime:

    import { createClient } from '@supabase/supabase-js';
    
    const supabase = createClient('https://your-project-id.supabase.co', 'your-public-anon-key');
    
    const channel = supabase
      .channel('*')
      .on('postgres_changes', { event: 'INSERT', schema: 'public', table: 'questions' }, payload => {
        console.log(payload);
        // Call the DigitalOcean function with the new question payload
        fetch('https://faas-nyc1-2ef2e6cc.doserverless.co/api/v1/namespaces/fn-4debf4e9-6121-41dc-a674-9ba721490ff8/actions/answer/test-pg-chatgpt-4o?blocking=true&result=true', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Basic ${process.env.DO_TOKEN}`,
          },
          body: JSON.stringify({ question_id: payload.new.id }),
        }).then(response => response.json())
          .then(data => console.log(data))
          .catch(error => console.error('Error:', error));
      })
      .subscribe();
    
  2. Deploy and Run:

    • Deploy this script on a serverless environment or a microservice framework to ensure it runs continuously.
    • Handle potential network interruptions and auto-reconnect to Supabase Realtime.

B. Leverage Supabase Functions with the Kafka Connector

Supabase can integrate with Kafka for real-time event streaming, allowing you to handle events asynchronously with more flexibility and reliability.

Steps:

  1. Set up Kafka and Connectors:

    • Set up a Kafka cluster (you can use managed services like Confluent Cloud).
    • Use Supabase's webhook to publish new questions to a Kafka topic via an HTTP Sink connector.
  2. Create a Consumer Service:

    • Develop a consumer service that listens to Kafka topics.
    • When a new question message is received, the service calls your DigitalOcean Function.

Example Consumer:

const { Kafka } = require('kafkajs');

// Initialize Kafka
const kafka = new Kafka({
  clientId: 'your-client-id',
  brokers: ['broker1:9092', 'broker2:9092']
});

const consumer = kafka.consumer({ groupId: 'question-consumers' });

const run = async () => {
  await consumer.connect();
  await consumer.subscribe({ topic: 'new-questions', fromBeginning: true });

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const question = JSON.parse(message.value.toString());
      console.log(`Received question: ${question.id}`);
      
      fetch('https://faas-nyc1-2ef2e6cc.doserverless.co/api/v1/namespaces/fn-4debf4e9-6121-41dc-a674-9ba721490ff8/actions/answer/test-pg-chatgpt-4o?blocking=true&result=true', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          'Authorization': `Basic ${process.env.DO_TOKEN}`,
        },
        body: JSON.stringify({ question_id: question.id }),
      }).then(response => response.json())
        .then(data => console.log(data))
        .catch(error => console.error('Error:', error));
    }
  });
};

run().catch(console.error);

Conclusion

Given your MVP focus, leveraging Supabase Realtime might be the quickest path. This approach avoids complex database triggers and cross-database reference issues, allowing you to handle events on the application side with more control and easier debugging. If you need more advanced scalability and reliability, consider the Kafka solution later on.

The 3.B answer is horrible here. You don't want to get anywhere close to Kafka for an MVP.

3.A seems like a reasonable approach. It's essentially a webhook, just not one from Apache Answer, instead it's one from Supabase. So an event on answer insert that calls your function seems like a reasonable approach.

Good luck!

3.A almost works but is inconsistent. It seems like the way Answer implemented question IDs messes with auto-incrementing and Supabase's Realtime API. I'll go with your suggestion to just work with the backend Golang code directly.

I see that you're trying to integrate an Open Source project called [Apache answer](https://github.com/apache/incubator-answer) with OpenAI and DigitalOcean Functions. You've faced some hurdles in setting up a trigger function in Supabase Postgres to call your DO Function when a new question is asked.

Let's breakdown the approaches and challenges faced:

  1. Approaches:

a. Webhooks: Since Apache Answer doesn't have webhooks out of the box, you could create a custom solution using webhooks. This would require setting up a HTTP endpoint to receive notifications when a new question is asked. You could then use a library or framework to make API calls to OpenAI.

b. Database Triggers: Your attempt to use a database trigger to call your DO Function is a great approach. However, you encountered an error due to cross-database references.

c. API Calls: A third alternative is to use a scheduled job (e.g., using a library like schedule in Python) to query the Apache Answer API every few minutes to fetch new questions. This approach might not provide the desired real-time experience.

  1. Challenges:

a. Cross-Database References: As you've encountered, Supabase's Postgres instance is separate from your DO Function. This makes it challenging to call the DO Function directly from a trigger function.

b. Authorization: You've encountered issues with authorization headers when calling the DO Function. Ensure you've correctly set up authentication and authorization mechanisms for your API calls.

c. Scalability: As Apache Answer grows, you'll need to consider scalability concerns when designing your solution.

To overcome these challenges, consider the following:

a. Hybrid Approach: Combine approaches a and c to achieve a balance between real-time performance and scalability.

b. Third-Party Libraries: Utilize libraries like requests or aiohttp to simplify API calls to OpenAI.

c. Supabase Integration: Explore Supabase's integration capabilities to integrate your DO Function with Supabase.

d. Error Handling: Implement robust error handling mechanisms to ensure your solution remains resilient in the face of errors.

e. Logging and Monitoring: Set up logging and monitoring mechanisms to track your solution's performance, identify bottlenecks, and troubleshoot issues.

Remember to weigh the trade-offs between complexity, scalability, and maintainability when designing your solution. By combining these approaches, you can create a robust and scalable integration between Apache Answer, OpenAI, and DigitalOcean Functions.