:: krowemoh

Saturday | 08 NOV 2025
Posts Links Other About Now

previous
next

2025-10-23
Solr in Production

solr, search

Solr in production was easier to set up than zookeeper, likely because I already played with it in development mode.

Make sure to set up Zookeeper before solr.

They key here was to follow the instructions.

Installation

To install solr as a service, we will need java, lsof and chkconfig:

yum install java-11-openjdk-devel
yum install chkconfig
yum install lsof

We can then get the solr download:

wget "https://dlcdn.apache.org/solr/solr/9.9.0/solr-9.9.0.tgz"

Next we get the install script from the tar directly:

tar xzf solr-9.9.0.tgz solr-9.9.0/bin/install_solr_service.sh --strip-components=2

Once we have the install script, we can then run it specifying the tar to use:

bash ./install_solr_service.sh solr-9.9.0.tgz

This should create the solr user and set up all the directories. Solr will save data to /var/solr/data and logs are in /var/solr/logs.

Configuration

At this point solr is also running. However we need to let solr know about zookeeper and we also need to enable the extraction module.

We need to update /etc/default/solr.in.sh:

SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j2.xml"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"

SOLR_MODULES=extraction,ltr
ZK_HOST="localhost:2181,localhost:2182,localhost:2183"

I added the last two lines to the bottom of the config.

Once we've updated the config, then we can restart solr:

service solr restart

Creating a Collection

Now we can create our first collection:

/opt/solr/bin/solr create -c posts --shards 2 -rf 2

This has created the posts collection. Now I want to create _text_ field that we will search against. This is the catchall field.

curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' http://localhost:8983/solr/posts/schema

Finally, I loaded in some data:

/opt/solr/bin/solr post -c posts /home/username/BLOG/json/*

I exported my blog posts as json so that I can easy ingest everything.

Now I can do a quick search to test everything:

curl "http://localhost:8983/solr/posts/select?q=sola~1&hl=true*&hl.fl=title%20markdown"

This should give me back a list of results that contain the word sola or anything that has an edit distance of 1 to that word. It should also give me a context fragment with highlighting.