Exporting a .bib File from Google Scholar and Generating TOML-formatted Markdown Files

Introduction

This post will show you how I added the publications on my website, that is generated by Hugo. I will try to provide step-by-step instructions on exporting your .bib file from Google Scholar and converting it into TOML-formatted markdown files using Python.

Step 1: Exporting .bib from Google Scholar

Login to Google Scholar: Head over to Google Scholar and sign in using your credentials.
Access “My Library”: Here you will find all your saved articles and citations.
Select Articles: Choose the articles you wish to export by ticking the checkboxes next to them.
Click the Export Button: This button, usually represented by quotation marks, will give you different export options.
Choose BibTeX: Select the BibTeX option to export your articles in the .bib format.
Save the File: The BibTeX formatted content will be displayed in a new window. Copy this content and save it in a .bib file, for example, references.bib.

Step 2: Converting .bib to TOML-formatted Markdown with Python

Now that we have our .bib file, let’s use Python to parse the file and create individual markdown files with TOML front matter.

a. Setting Up the Environment

Before delving into the code, make sure you have the required Python libraries:

pip install pybtex pylatexenc

b. Code Breakdown

i. Key Imports:

os & pathlib.Path: Used for directory and file path operations.
bibtex: To parse the .bib file.
latex2text: From pylatexenc, used to convert LaTeX to Unicode.

import os
from pybtex.database.input import bibtex
from pathlib import Path
from pylatexenc.latex2text import LatexNodes2Text

ii. LaTeX to Unicode Conversion:

We’ll use the pylatexenc library to easily convert LaTeX-specific text into Unicode. This is especially useful for author names, titles, or sources that might use LaTeX-style formatting.

def latex_to_unicode(text):
    return LatexNodes2Text().latex_to_text(text)

iii. Generating Markdown:

The create_or_update_md function will be responsible for converting each .bib entry into its markdown equivalent with TOML front matter. By default it will set the publication to hidden=true. When hidden the publication will not be listed as an article, but will be listed as non-clickable entry in the partial.

def create_or_update_md(entry):
    # Extracting common fields
    key = entry.key
    title = entry.fields.get('title', '')
    year = entry.fields.get('year', '')
    journal = entry.fields.get('journal', entry.fields.get('booktitle', ''))
    volume = entry.fields.get('volume', '')
    pages = entry.fields.get('pages', '')
    publisher = entry.fields.get('publisher', '')
    url = entry.fields.get('url', '')

    # Extracting the author list and convert it to a list of authors
    authors = [latex_to_unicode(str(author)) for author in entry.persons.get('author', [])]

    # Define the path
    folder_path = Path(f"content/publications/{key}")
    file_path = folder_path / "index.md"

    # Check if file exists
    if not file_path.exists():
        # If file doesn't exist, create a new markdown file with TOML front matter
        folder_path.mkdir(parents=True, exist_ok=True)
        with open(file_path, 'w') as f:
            f.write("+++\n")
            f.write(f'title = "{title}"\n')
            f.write('hidden = true\n')
            f.write(f'authors  = {authors}\n')
            f.write(f'date = {year}-01-01\n')
            f.write(f'journal = "{journal}"\n')
            f.write(f'volume = "{volume}"\n')
            f.write(f'pages = "{pages}"\n')
            f.write(f'publisher = "{publisher}"\n')
            f.write(f'url = "{url}"\n')
            f.write("+++\n\n")
            f.write(f"Summary about {title}.")

iv. Parsing & Execution:

Here, we parse the .bib file and process each entry:

def main():
    parser = bibtex.Parser()
    bib_data = parser.parse_file("resources/references.bib")

    for entry in bib_data.entries.values():
        create_or_update_md(entry)

if __name__ == "__main__":
    main()

Step 3: Displaying in Hugo

You will need font-awesome package, you can add this head to your partial or in a included template, but just make sure you include it somewhere.

<head>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css">
</head>

After running the Python script, you’ll have a collection of markdown files ready to be displayed in Hugo. Here’s how you can showcase them using the provided Hugo partials.

{{- $pages := where .Site.RegularPages "Section" "publications"}}
{{- $sortedPages := $pages.ByDate.Reverse }}
{{- $latestPublications := $sortedPages | first 5 }}

<div class="latest-publications">
    <h2>Latest Publications</h2>

    <ul class="">
        {{- range $latestPublications }}
        <li>
            {{ $currentPublication := . }}

            {{ if ne .Params.hidden true }}
            <a href="{{ .Permalink }}">{{ .Title }} <i class="fas fa-file-alt"></i></a> ({{ .Date.Format "2006" }})
            {{ else }}
            {{ .Title }} ({{ .Date.Format "2006" }})
            {{ end }}
            <p>
                {{ range $i, $author := .Params.authors }}
                {{ if lt $i 1 }}{{ $author }}{{ if lt $i (sub (len $currentPublication.Params.authors) 1) }} and {{ end
                }}{{ end }}
                {{ if eq $i 1 }}{{ $author }} et al.{{ end }}
                {{ end }}
            </p>
        </li>
        {{- end }}
    </ul>

</div>

Simply embed this partial into your desired Hugo template, and your publications will be presented.

The Complete Python Script:

For those who want to dive straight in, here’s the full Python script:

import os
from pybtex.database.input import bibtex
from pathlib import Path
from pylatexenc.latex2text import LatexNodes2Text
import re


def latex_to_unicode(text):
    return LatexNodes2Text().latex_to_text(text)


def create_or_update_md(entry):
    # Extracting common fields
    key = entry.key
    title = entry.fields.get('title', '')
    year = entry.fields.get('year', '')
    journal = entry.fields.get('journal', entry.fields.get('booktitle', ''))
    volume = entry.fields.get('volume', '')
    pages = entry.fields.get('pages', '')
    publisher = entry.fields.get('publisher', '')
    url = entry.fields.get('url', '')

    # Extracting the author list and convert it to a list of authors
    authors = [latex_to_unicode(str(author)) for author in entry.persons.get('author', [])]

    # Define the path
    folder_path = Path(f"content/publications/{key}")
    file_path = folder_path / "index.md"

    # Check if file exists
    if not file_path.exists():
        # If file doesn't exist, create a new markdown file with TOML front matter
        folder_path.mkdir(parents=True, exist_ok=True)
        with open(file_path, 'w') as f:
            f.write("+++\n")
            f.write(f'title = "{title}"\n')
            f.write('hidden = true\n')
            f.write(f'authors  = {authors}\n')
            f.write(f'date = {year}-01-01\n')
            f.write(f'journal = "{journal}"\n')
            f.write(f'volume = "{volume}"\n')
            f.write(f'pages = "{pages}"\n')
            f.write(f'publisher = "{publisher}"\n')
            f.write(f'url = "{url}"\n')
            f.write("+++\n\n")
            f.write(f"Summary about {title}.")


def get_author_list(entry):
    if 'author' in entry.persons:
        authors = entry.persons['author']
        author_str = ' and '.join(str(author) for author in authors)
        return latex_to_unicode(author_str)
    return ''


def update_field(content, field, value):
    # If the field exists, update it. If not, just return the content as is.
    if f"{field}:" in content:
        content = re.sub(f'{field}:.*', f'{field}: "{value}"', content)
    return content


def main():
    parser = bibtex.Parser()
    bib_data = parser.parse_file("resources/references.bib")

    for entry in bib_data.entries.values():
        create_or_update_md(entry)


if __name__ == "__main__":
    main()

Introduction¶

Step 1: Exporting .bib from Google Scholar¶

Step 2: Converting .bib to TOML-formatted Markdown with Python¶

a. Setting Up the Environment¶

b. Code Breakdown¶

i. Key Imports:¶

ii. LaTeX to Unicode Conversion:¶

iii. Generating Markdown:¶

iv. Parsing & Execution:¶

Step 3: Displaying in Hugo¶

The Complete Python Script:¶