Add Algolia Search Functionality to Docusaurus
This article introduces how to add search functionality to a Docusaurus website using Algolia.
Algolia Configuration Tutorial 🔎
I referred to this article to add a search function to my website. However, during the implementation process, I encountered some problems and the original article's explanation was not detailed enough. Therefore, I decided to write a more detailed blog to record the entire configuration process.
Obtain the Algolia API
The objective of this section is to obtain these three APIs of Algolia:
1. Application ID
2. Search API Key
3. Admin API Key
API Name Correlation
The names of the APIs in Algolia do not exactly match those in the code. The corresponding relationship is as follows:
Application ID -> appId
Search API Key -> apiKey
Application ID -> ALGOLIA_APP_ID
Admin API Key -> ALGOLIA_API_KEY
Register an Algolia account
- Open Algolia's official website, use your Github account to register and log in
- Click on
Not now...to skip the initial setup.
Obtain API
- Open the
Settingsconfiguration page
- View API Key
The above three APIs are exactly what we need. This page can remain open for your convenience in copying and pasting later. At this point, the three APIs should have been successfully obtained.
Do not share the Admin API Key! This is sensitive information and its leakage may lead to security issues!
Only two APIs were found?
If when viewing the API, you notice that there are only two APIs, namely Application ID and Search API Key, but no Admin API Key? ! This indicates that you did not follow this step precisely. What you see now is the default Application Docsearch created by Algolia. This Application currently does not have a key. (Following the official tutorial step by step should result in one, but I haven't tried it. 😱) The solution is to create another Application here: Settings/API Keys
After filling in the name, keep the other configurations as default. Simply click Next:Review & Confirm -> Create Application. The API acquisition method for the new Application is the same as above.
Create Actions Secrets
- Open the Github repository where the website project code is stored, and go to:
Settings/Secrets and variables/Actions.
- Return to the Algolia page and copy the
Application ID:
- Return to the Github page, enter the name
ALGOLIA_APP_ID, paste the secret key, and clickAdd secret.
- Return to the Algolia page and copy the "Admin API Key":
- Return to the Github page, enter the name
ALGOLIA_API_KEY, paste the secret key, and clickAdd secret.
- Expected Outcome:
Create a Github Actions workflow
- You can create a file named
.github/workflows/docsearch.ymlin the root directory of your local project:
name: DocSearch Scraper
# Define the triggering conditions for the workflow
on:
# Allow manual triggering of the workflow
workflow_dispatch:
# Automatic operation (UTC time)
schedule:
- cron: '0 0 * * *'
# Define workflow tasks
jobs:
algolia:
# Operation Environment
runs-on: ubuntu-latest
steps:
# Check out the code repository
- uses: actions/checkout@v3
# Install the jq tool for handling JSON.
- name: Install jq
run: sudo apt-get install jq
# Verify the existence of the configuration file
- name: Validate docsearch.json exists
run: |
if [ ! -f "docsearch.json" ]; then
echo "docsearch.json not found!"
exit 1
fi
# Read and set the Algolia configuration
- name: Get the content of docsearch.json as config
id: algolia_config
run: echo "config=$(cat docsearch.json | jq -r tostring)" >> $GITHUB_OUTPUT
# Run the DocSearch crawler
- name: Run algolia/docsearch-scraper image
env:
# Obtain the Algolia credentials from secrets
ALGOLIA_APP_ID: ${{ secrets.ALGOLIA_APP_ID }}
ALGOLIA_API_KEY: ${{ secrets.ALGOLIA_API_KEY }}
CONFIG: ${{ steps.algolia_config.outputs.config }}
run: |
echo "Starting DocSearch scraper..."
docker run \
--env APPLICATION_ID=${ALGOLIA_APP_ID} \
--env API_KEY=${ALGOLIA_API_KEY} \
--env "CONFIG=${CONFIG}" \
algolia/docsearch-scraper
echo "DocSearch scraper completed successfully."
If you want to configure it so that this workflow is triggered after deployment, you can refer to this article. However, I think manual and scheduled triggering are already sufficient, so I did not try other triggering methods.
Create docsearch.json
The highlighted yellow part below needs to be customized with your own settings. index_name can be customized, but it must be consistent with the subsequent docusaurus.config.js configuration. start_urls and sitemap_urls should be changed to your own links.
{
"index_name": "test-site",
"start_urls": ["https://www.eurekashadow.xin/"],
"sitemap_urls": ["https://www.eurekashadow.xin/sitemap.xml"],
"js_render": true,
"js_wait": 1,
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1, article h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"custom_settings": {
"attributesForFaceting": ["type", "lang", "language", "version", "docusaurus_tag"],
"attributesToRetrieve": ["hierarchy", "content", "anchor", "url", "url_without_anchor", "type"],
"attributesToHighlight": ["hierarchy", "content"],
"attributesToSnippet": ["content:10"],
"camelCaseAttributes": ["hierarchy", "content"],
"searchableAttributes": [
"unordered(hierarchy.lvl0)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy.lvl6)",
"content"
],
"distinct": true,
"attributeForDistinct": "url",
"customRanking": ["desc(weight.pageRank)", "desc(weight.level)", "asc(weight.position)"],
"ranking": ["words", "filters", "typo", "attribute", "proximity", "exact", "custom"],
"highlightPreTag": "<span class='algolia-docsearch-suggestion--highlight'>",
"highlightPostTag": "</span>",
"minWordSizefor1Typo": 3,
"minWordSizefor2Typos": 7,
"allowTyposOnNumericTokens": false,
"minProximity": 1,
"ignorePlurals": true,
"advancedSyntax": true,
"attributeCriteriaComputedByMinProximity": true,
"removeWordsIfNoResults": "allOptional",
"separatorsToIndex": "_",
"synonyms": [
["js", "javascript"],
["ts", "typescript"]
]
}
}
Note: The two lines marked in green (
js_renderandjs_wait) are extremely important. Without these two lines, Algolia may only be able to crawl the data of the homepage. When clicking F12 on pages other than the homepage, you may find a 404 page, but the webpage can still be displayed normally at this time. This inconsistent display with the source code makes it impossible to crawl the complete data. By adding these two lines to indicate that crawling should be done after the rendering is completed, all data can be normally obtained.
Configure docusaurus.config.js
- Position illustration:
- New code:
//Add Algolia search
algolia: {
// Application ID
appId: '********',
// Search API Key
apiKey: '*******************',
indexName: 'test-site',//Consistent with the index_name in docsearch.json
searchPagePath: 'search',
contextualSearch: true
},
Don't remember the correspondence between the names?
Website Deployment
- Push the code to the remote repository using
git push.
Manually trigger the workflow to retrieve information
- After the deployment on Vercel has been completed, go back to the GitHub repository, click on Actions, select "Docsearch scraper", and then click "Run workflow". This will trigger the workflow, which will crawl the website information and generate an index file for subsequent Algolia search.
- Operation completed:
View Algolia Records
- Click on
Data Source
- Click on
indices, as shown in the following figure, it can be seen that the index file has been generated. Under the indextest-site, a total of134records were found. The number of these records depends on the website content. If the content is extensive, the number of records will also be large. This is just a demonstration of the website, and there are only134records. Normal index records may have thousands or even tens of thousands.
Note that the configuration is considered successful only when there are matching records here with the content of your website. If the website content is extensive and the resulting records here are only a few, it clearly indicates that something has gone wrong somewhere. It might be that the two lines here were not added?
Deploy the website to GitPages
In fact, you can view the results locally. The purpose of deploying and then checking is to further confirm whether the results are correct. After all, sometimes everything runs fine locally, but when deployed to GitPages, problems arise. (This is referring to you! i18n, the annoying flashing issue still hasn't been resolved... 😔)
- Deployment completed:
Result
Mission accomplished! Now the website can use the search function! 🎉
This process was quite complicated. Thanks to the following references, without his article as a reference, I'm afraid I wouldn't have been able to complete this search function. But even with the references, it wasn't achieved overnight. To realize this search function, I spent a lot of time and took many detours. I hope this summary can be helpful to those who come across it (maybe even myself in the future? 😥)
📚 References
[1]. AlanWang's Blog
