Scheduling Audio Transcriptions with QStash
In this tutorial, you will learn how to build a scheduled audio transcription system using Upstash QStash for task scheduling and Fireworks AI for transcription. You will also learn techniques for secure file uploads to Cloudflare R2, user authentication with Clerk, and data storage with Upstash Redis.
Prerequisites
You will need the following:
- Node.js 18 or later
- A Clerk account
- An Upstash account
- A Fireworks account
- A Cloudflare account
- A Vercel account
Tech Stack
Technology | Description |
---|---|
Next.js | The React Framework for the Web. |
Clerk | User Management Platform. You are going to use it to add authentication to your application. |
Upstash | Serverless database platform. You are going to use both Upstash QStash and Upstash Redis for scheduling transcriptions, and per-user transcription(s) status. |
Fireworks | A generative AI inference platform to run and customize models with speed and production-readiness. |
Cloudflare R2 | A cloud object storage service. |
Vercel | A cloud platform for deploying and scaling web applications. |
Generate a Fireworks AI Token
Using Fireworks AI API, you are able to create transcription of an audio using AI. Any request to Fireworks AI API requires an authorization token. To obtain the token, navigate to the API Keys in your Fireworks AI account, and click the Create API Key button. Copy and securely store this token for later use as FIREWORKS_API_KEY environment variable.
Setting up Upstash Redis
In your Upstash dashboard, go to Redis tab and create a database.
Scroll down until you find the REST API section, and select the .env
button. Copy the content and save it somewhere safe.
Setting up Upstash QStash
To schedule POST requests to the endpoint transcribing an audio at a given interval, you will use QStash. Go to the QStash tab and scroll down to the Request Builder tab.
Now, copy the QStash URL, QStash TOKEN, Current Signing Key, Next Signing Key, and save them somewhere safe.
Create a new Clerk application
In your Clerk Dashboard, to create a new app, press the + New application card to interactively start curating your own authentication setup form.
With an application name of your choice, enable user authentication via credentials by toggling on Email and allow user authentication via Social Sign-On by toggling on providers such as Google, GitHub and Microsoft.
Once the application is created in the Clerk dashboard, you will be shown with your application's API keys for Next.js. Copy the content and save it somewhere safe.
Create a new Next.js application
Let’s get started by creating a new Next.js project. Open your terminal and run the following command:
When prompted, choose:
Yes
when prompted to use TypeScript.No
when prompted to use ESLint.Yes
when prompted to use Tailwind CSS.No
when prompted to usesrc/
directory.Yes
when prompted to use App Router.No
when prompted to customize the default import alias (@/*
).
Once that is done, move into the project directory and start the app in developement mode by executing the following command:
The app should be running on localhost:3000. Stop the development server to install the necessary dependencies with the following commands:
The libraries installed include:
form-data
: A library to create readablemultipart/form-data
streams.node-fetch
: A module that brings the Fetch API to Node.js.@clerk/nextjs
: Clerk’s SDK for Next.js.@upstash/redis
: SDK to interact over HTTP requests with Redis, built on top of Upstash REST API.@upstash/qstash
: SDK to interact with your Upstash QStash instance over HTTP requests.@aws-sdk/client-s3
: AWS SDK for JavaScript S3 Client for Node.js, Browser and React Native.@aws-sdk/s3-request-presigner
: SDK to generate a presigner based on signature V4 that will attempt to generate signed url for S3.
Now, create a .env
file at the root of your project. You are going to add the FIREWORKS_API_KEY
, AWS_KEY_ID
, AWS_REGION_NAME
, AWS_S3_BUCKET_NAME
, AWS_SECRET_ACCESS_KEY
, CLOUDFLARE_R2_ACCOUNT_ID
, NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
, CLERK_SECRET_KEY
, QSTASH_TOKEN
, QSTASH_CURRENT_SIGNING_KEY
, QSTASH_NEXT_SIGNING_KEY
, UPSTASH_REDIS_REST_URL
, and UPSTASH_REDIS_REST_TOKEN
values you obtained earlier. It should look something like this:
To create API endpoints in Next.js, you will use Next.js Route Handlers which allow you to serve responses over Web Request and Response APIs. To start creating API routes in Next.js that streams responses to the user, execute the following commands:
The -p
flag creates parent directories of a directory if they're missing.
This sets up our Next.js project. Now, let's set up Clerk in the application.
Set up Clerk SDK with Next.js
Clerk has a Next.js SDK that contains helpers to make implementation of sign in modal, and managing (authenticated) sessions easier. You will add the ClerkProvider
component to the global layout of your Next.js application. This is a critical component as it provides access to the active session, and user context to all of Clerk’s components present anywhere in the application.
Make the following additions in app/layout.tsx
to wrap the whole Next.js application with ClerkProvider
component:
Now, let's move on to configuring the Next.js middleware for managing sessions with Clerk.
Configure Next.js Middleware for Clerk
Clerk requires middleware to allow granular control of protection via authentication over routes (including router handlers) on a per-request basis. Create a middleware.ts
file at the root of your project with the following:
The code above imports Clerk's clerkMiddleware
helper extending the ability to mark specific routes as public (i.e. they are accessible without authentication), as ignored (i.e. authentication checks are not ran on such pages), and as API Routes (i.e. they are treated by Clerk as API Endpoints). The middleware is applied to the route paths matching the matcher
option in config
object, which per the above config is all non-static assets paths.
Now, let's integrate shadcn/ui
components in Next.js.
Integrating shadcn/ui components
To quickly prototype the user interface, you will set up the shadcn/ui
with Next.js. shadcn/ui
is a collection of beautifully designed components that you can copy and paste into your apps. To set up shadcn/ui
, execute the command below:
You will be asked a few questions to configure a components.json
, choose the following:
Yes
when prompted to use TypeScript.Slate
when prompted to choose the base color.yes
when prompted to use CSS variables for colors.Yes
when prompted to proceed with writing the configuration to components.json.
Once that is done, you have set up a CLI that allows us to easily add React components to your Next.js application. Next, execute the command below to get the button, input, tooltip, and toast elements:
Once that is done, you would now see a ui
directory inside the app/components
directory containing button.tsx
, input.tsx
, tooltip.tsx
, toast.tsx
, toaster.tsx
, and use-toast.ts
.
Next, open up the app/layout.tsx
file, and make the following additions:
In the code changes above, you have imported the Toaster
component, and made sure that it is present in your entire Next.js application. It enables you to show toast notifications from anywhere in your code via the useToast
React hook.
Transcribe Audio API Endpoint in Next.js App Router
Create a file named route.ts
in the app/api/transcribe
directory that transcribes an audio after fetching it from Cloudflare R2, with the following code:
The Next.js API endpoint above is for transcribing audio files using Fireworks AI. It leverages the verifySignatureAppRouter
middleware from Upstash QStash for request authentication. The handler function performs the following operations:
- Retrieves the audio file from Cloudflare R2
- Converts the audio data into a
FormData
object - Sends a POST request to the Fireworks API for transcription
- Processes the API response
- Stores the transcription result in Upstash Redis
The Redis storage uses a composite key structure, combining the user's identifier with the filename, to associate transcriptions with specific users.
Now, let's create the endpoint that schedules the transcription of audio files.
Scheduling Audio Transcriptions API Endpoint in Next.js App Router
Scheduling transcriptions allows for asynchronous processing of audio files, freeing up resources for other tasks while the transcription is being performed. This is particularly useful in scenarios where multiple users require transcriptions, as it prevents resource contention and ensures that each user's request is processed at the latest.
Create a file named route.ts
in the app/api/schedule
for scheduling audio transcriptions, with the following code:
The code above defines a Next.js API endpoint that schedules the transcription of audio files using Upstash QStash. It first verifies the request signature to ensure the request is from QStash, then it fetches the audio file from Cloudflare R2, converts it into a FormData
object, and sends it to the Fireworks API for transcription. Upon receiving the transcription response, it updates the transcription status in Upstash Redis.
Now, let's create the endpoint that generates presigned URLs for uploaded files.
Using Presigned URLs for Large Audio File Uploads
When dealing with large audio files, it's crucial to implement an efficient upload mechanism. One effective approach is to use presigned URLs, which allow for direct uploads to cloud storage services like Amazon S3 or Cloudflare R2. This method offers several advantages over server-side uploads:
- Reduced server load: The audio file is uploaded directly to the storage service, bypassing your application server.
- Improved performance: Large files can be uploaded faster and more reliably.
- Scalability: This approach can handle multiple large file uploads simultaneously without overwhelming your server resources.
Create a file named route.ts
in the app/api/upload
for generating presigned URLs for uploaded files, with the following code:
The code above defines a Next.js API route handler for GET requests. The handler authenticates the user via Clerk, extracts fileName
and contentType
from query parameters, and generates a signed URL for S3 object upload. It then returns a signed object as a JSON response. The route is secured with Clerk, requiring user authentication and validating necessary parameters before processing the upload.
Now, let's create the endpoint that retrieves signed URLs for uploaded files.
Retrieving Signed URLs for Cloudflare R2 Files
To manage per-user uploads efficiently, you would want to prefix filenames with the user's ID obtained from Clerk. This approach simplifies file organization and access control in Cloudflare R2. When uploading, we concatenate the user ID with the filename using a unique separator. For retrieval, we verify that the requested filename starts with the authenticated user's ID, ensuring users can only access their own files. This method provides a simple yet effective way to segregate and secure user data in cloud storage.
Create a file named route.ts
in the app/api/get
for retrieving signed URLs for uploaded files, with the following code:
The code above uses Clerk for authentication and extracts a fileName
from query parameters. It includes additional security by checking if the filename starts with the user's ID. If all checks pass, it calls a getS3Object
function to generate a signed URL for retrieving the object and returns this URL in the response.
Now, let's create the endpoint that retrieves all the transcriptions done for a particular user in a paginated form.
Transcriptions History API Endpoint in Next.js App Router
Create a file named route.ts
in the app/api/history
directory that retrieves all the transcriptions done for a particular user in a paginated form, with the following code:
The Next.js API endpoint retrieves the transcription history for authenticated users by checking their authentication status with Clerk. It uses Redis to fetch the data in a paginated format and returns the transcription history as a JSON response, allowing clients to specify a starting point for retrieval.
Now, let's create the interface for the application.
Building the application interface
Open the app/page.tsx
file, and replace the existing code with the following:
The code above implements a file upload functionality for audio files. It uses Clerk for authentication, manages state with React hooks, and interacts with API endpoints for file upload and audio history retrieval. The component renders a user interface with conditional elements based on the user's authentication status and handles file selection and upload processes.
Now, let's add the remaining components to the interface.
The code above displays a list of uploaded audio files with their transcription status, showing either the completed transcription or a loading indicator. The interface includes authentication features, allowing users to sign in and upload audio files. It also provides buttons for uploading new audio files and refreshing the list of transcriptions.
That was a lot of learning! You’re all done now ✨
Deploy to Vercel
The repository, is now ready to deploy to Vercel. Use the following steps to deploy 👇🏻
- Start by creating a GitHub repository containing your app's code.
- Then, navigate to the Vercel Dashboard and create a New Project.
- Link the new project to the GitHub repository you just created.
- In Settings, update the
Environment Variables
to match those in your local.env
file. - Deploy! 🚀
More Information
For more detailed insights, explore the references cited in this post.
Conclusion
In this blog, you learned how to build an audio transcription system using Upstash QStash, Upstash Redis, Fireworks AI, Clerk, Next.js App Router, Cloudflare R2, and Vercel. You set up Upstash QStash to schedule transcriptions, maintain per-user transcription history and audio file references, configured Clerk for user authentication, and integrated Fireworks API to generate AI-powered transcription. The tutorial also covered how to upload files to Cloudflare R2 using presigned URLs and retrieve them using signed URLs. Finally, you deployed the application on Vercel.