paper{s}pace
This is the documentation for version 2.+ of paper{s}pace. Please find the current version under latest documentation
Documentation
Table of Contents
Features
works with your existings files
paper{s}pace does indead work with your existing file structure. You only need to point it to your documents folder and paper{s}pace will do it's job. It doesn't matter in which subfolder the file exists, as long as it is supported it will be indexed and searchable in a couple of seconds.
fulltext searching your files
After paper{s}pace has found your files, it will recognize the available text and you can search for the content through it's full text search. Combine this with your tags and you will never miss an importand document again.
tagging your pdf's
Adding tags to your files is as easy as entering a name. paper{s}pace will manage all tags on its own. No need for a complicated tag management.
reminds you of upcoming tasks
With the ability to have a special type of document, called a task. paper{s}pace will remind you per email one day in advance that you have to take action.
supported file formats
- images (pdf/png)
Usage
Most of the functionality paper{s}pace provides should be pretty self explainatory. I will only explain some of the features.
Search
The search field in the top right corner provides your main entry point in finding your documents.
You can search by a simple string like invoice
and this will deliver you all documents which
contain the word invoice
in either the document text, the description or the title. If you tagged
your documents you can narrow the results down by clicking on one or multiple of the tags in the left area.
You can prefix a word with +
or -
. This will include or exclude results with the
given search term. For example searching for
invoice +2020
will return every document which contains the words invoice
AND 2020
. Searching for invoice -2020
will return
every document which contains the word invoice
BUT NOT 2020
Installation
Attention: There is no upgrade path from version 1 of paper{s}pace. Please uninstall all previous installations.
There are two ways of installing or using paper{s}pace. You can install it on a server by hand or use docker for an easy way of installing paper{s}pace. Only docker is officially supported.
Docker
The easiest installation method is checking out or downloading one of the samples in examples and start the application with docker-compose.
There are two files:
- complete-with-ftp.yml
- Includes the whole application and an additional ftp server which you can point your document scanner to and then upload documents directly into paper{s}pace
- minimal-docker-compose.yml
- Includes only the application
First decide which one you want. Copy it to your hard drive and rename it to docker-compose.yml
Then configure the properties you need and want in the environment
block.
OCR_LANGUAGE: '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]>'
APPLICATION_HOST: 'REPLACE WITH HOSTNAME OR IP' #what is the domain for paper{s}pace. Defaults to http://localhost:8080
ENABLE_MAIL: '<ENABLE MAILS? [true|false]> '
MAIL_TO_ADDRESS: '<WHO SHOULD RECEIVE MAILS [string]> '
MAIL_FROM_ADDRESS: '<THE SENDER OF THE MAILS [string]> '
MAIL_ATTACH_DOCUMENTS: '<SHOULD WE ATTACH THE DOCUMENTS TO THE MAIL? [true|false]> '
MAILING_HOST: '<YOUR MAIL HOST THE APPLICATION SHOULD CONNECT TO. [string]> '
MAILING_PORT: '<THE PORT FOR YOUR MAIL HOST [integer]> '
MAILING_PROTOCOL: '<WHICH MAIL PROTOCOLL SHOULD WE USE [smtp|pop3]> '
MAILING_SMTP_AUTH: '<SHOULD WE AUTHENTICATE? [true|false]> '
MAILING_SMTP_USE_STARTTLS: '<SHOULD WE USE STARTSL? [true|false]> '
MAILING_USERNAME: '<WHICH USER SHOULD CONNECT TO YOUR MAIL PROVIDER? [string]> '
MAILING_PASSWORD: '<THE PASSWORD OF THE MAIL USER [true|false]> '
OCR_LANGUAGE: '<IN WHICH LANGUAGE ARE YOUR DOCUMENTS FOR EXAMPLE deu? [string]> '
When you don't want paper{s}pace to send out emails at all, remove everything which starts with
MAIL_
and MAILING_
and either set ENABLE_MAIL: 'false'
or remove that
key.
If you have choosen complete-with-ftp.yml
you also have to set the environment variable PASV_ADDRESS
to the ip adress or hostname the ftp is available. For example PASV_ADDRESS:192.168.1.111
and
propably change the password in VSFTPD_USER_1: 'scanner:password:9876:'
. if you want to change also
the username of the ftp user, please also change the mount point under volumes
. For a complete list
of configuration options please go to wildscamp/vsftpd.
After you have made the changes to the configuration you can start paper{s}pace with docker compose. Simply open a terminal, navigate to the folder your docker-compose.yml resides and execute
docker-compose up -d
This will start paper{s}pace locally and you can open it by navigating your browser to http://localhost:8080
using existing documents
In the sample configurations we work with a named volume. If you already have
a folder with your documents you want to use, you have to mount this into the container. paper{s}pace expects
the following folders to be writeable for the user with the uid 9876
.
/storage/tasks #default location for task documents
/storage/documents #default location for documents
/storage/binary #storage location for preview images of documents
/storage/database #location of the database file
To work with your existing folders you have to change the volumes
section in the docker-compose.yml
under the service api
and if you have choosen the compose file with the ftp server you also have to change the mount point at the
service ftp
Let's assume your current documens resides under /home/paperspace/documents/
and your tasks will
be stored under /home/paperspace/tasks/
Then you have to change the section volumes
in your docker-compose.yml like this
version: "3.4"
services:
api:
...
volumes:
- paperspace:/storage
- /home/paperspace/tasks/:/storage/tasks
- /home/paperspace/documents:/storage/documents
...
ftp:
...
volumes:
- /home/paperspace:/home/virtual/scanner/data
Bare Metal Installation from Source
Install required software
This tutorial assumes a system based on Ubuntu 20.04 TLS. If you are running on a different distribution, please adapt the commands.
all following commands assume that your documents are in german. If you have a different language, please change
deu
into your language code. For example spanish, instead of installingtesseract-ocr-deu
you would installtesseract-ocr-spa
sudo apt-get install tesseract-ocr tesseract-ocr-deu openjdk-11-jdk-headless git npm && pip install stapler
fetch latest source code
git clone https://gitlab.com/dedicatedcode/paperspace.git
Install Solr
cd /opt
sudo wget https://archive.apache.org/dist/lucene/solr/8.3.1/solr-8.3.1.tgz
sudo tar xzf solr-8.3.1.tgz solr-8.3.1/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-8.3.1.tgz
create solr data directory referenced in search/config/conf/core.properties You can change this to your preferred place.
sudo mkdir -p /data/solr/
sudo chown solr:solr /data/solr
copy solr configuration
sudo cp -r paperspace/search/config/conf /var/solr/data/core_documents
restart solr
sudo systemctl restart solr.service
Build and install paper{s}pace
install app
-
build app
cd paperspace/api && ./gradlew build
-
create user, folder and copy app
sudo useradd -M -s /bin/false paperspace sudo mkdir /opt/paperspace-app sudo cp paperspace/api/build/libs/api.jar /opt/paperspace-app/app.jar sudo chown -R paperspace:paperspace /opt/paperspace-app
-
create application.properties with this content
database.location=database/paperspace.db search.host=localhost search.port=8983 ocr.language=deu app.host=http://localhost:8080 ocr.datapath=/usr/share/tesseract-ocr/4.00/tessdata storage.folder.tasks=/storage/tasks storage.folder.documents=/storage/documents storage.folder.binaries=binary ### REMOVE IF YOU DONT WANT TO USE MAILS ### email.enabled=true email.target-address= email.sender-address= email.attach_documents=false spring.mail.host= spring.mail.port= spring.mail.protocol=smtp spring.mail.test-connection=false spring.mail.properties.mail.smtp.auth=true spring.mail.properties.mail.smtp.starttls.enable=true spring.mail.username= spring.mail.password=
Adjust the property
ocr.language
to your language code and also the propertyapp.host
where the application is reachable. Replace/storage/tasks
and/storage/documents
with the path to your documents. If you want to have paper{s}pace send you an email on upcoming tasks or new documents fill the properties starting withemail
andspring.mail
. If you don't need emailing, simply delere the whole block from the application.properties. After this you should have the opt folder populatet like this:/opt/paperspace-app/ ├── app.jar └── application.properties
-
create service files
sudo tee <<EOF /etc/systemd/system/paperspace-app.service >/dev/null [Unit] Description=paperspace-app After=syslog.target [Service] User=paperspace WorkingDirectory=/opt/paperspace-app ExecStart=java -jar /opt/paperspace-app/app.jar SuccessExitStatus=143 [Install] WantedBy=multi-user.target EOF
reload systemd configurations
sudo systemctl daemon-reload
-
start services
sudo systemctl enable paperspace-app sudo systemctl start paperspace-app
If everything is set up the app should be now accessible over http://localhost:8080 and should you greet with an empty result. You can start throwing PDFs into the documents or task folder now.